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FOREWORD 


The  organisers  and  hosts  receive  great  pleasure  presenting  you  the  Proceedings  of  the 
Second  International  Conference  on  Electronic  Circuits  and  Systems,  ECS’99,  and  giving  you 
the  warmest  welcome  to  the  conference  and  the  joint  “Experts  Session”.  The  ECS  99  follows 
the  successful  and  memorable  conference  which  was  held  on  this  spot  in  September  1997.  The 
event  is  primarily  devoted  to  a  wide  spectrum  of  activities  and  latest  results  in  the  rapidly 
evolving  information  society  where  the  electronic,  particularly  microelectronic  and 
communication  technologies  play  an  important  role,  which  also  brings  about  a  swift 
expansion  in  the  field  of  microsystems  and  a  wide  variety  of  their  applications.  Another 
aspiration  of  the  conference  and  of  the  joint  Experts  Session  is  to  enhance  the  transfer  of  high- 

tech  towards  our  industrial  partners,  particularly  to  SMEs. 

We  have  received  up  to  100  submissions  from  25  countries  of  the  world.  After  a 
thorough  reviewing  process,  the  members  of  the  International  Program  Committee  have 
selected  46  submitted  papers  for  oral  presentation  and  10  papers  as  posters.  The  invited 
globally  renowned  speakers  will  cover  the  main  conference  topics.  The  invited  talks,  expert 
session,  contributed  oral  presentations  and  poster  sessions  along  with  many  interesting 
personal  meetings  during  coffee  breaks,  lunches  and  common  evenings  in  Bratislava  will  give 
all  of  us  excellent  chances  for  fruitful  discussions  and  dissemination  of  recent  achievements. 
It  is  our  pleasure  to  thank  all  authors  and  participants  for  the  effort  they  made  to  prepare  their 
contributions.  We  would  like  to  express  our  gratitude  also  to  the  members  of  the  International 
Program  Committee  for  their  effort  and  contribution  to  the  quality  of  the  conference. 

The  contributed  and  invited  papers  are  resented  in  the  Proceedings  by  extended  abstracts 
which  were  not  subject  to  further  reviewing,  expect  for  non-technical  editorial  aspects.  Thus, 
the  contents  of  the  papers  are  the  sole  responsibility  of  the  authors. 

We  believe  that  ECS{99  will  continue  the  aspiration  of  ECS‘97  to  establish 
a  forum  for  regular  European  meetings  bringing  together  the  leading  experts  from  the  world, 
particularly  from  the  countries  of  Europe.  Frequent  and  intense  international  collaboration, 
globalisation  of  the  world,  and  contacts  of  the  Department  of  Microelectronics  of  the  Slovak 
University  of  Technology  in  Bratislava,  and  the  geographical  location  of  Bratislava  on  the 
border  of  three  countries,  very  close  to  the  International  Airport  of  Vienna,  support  this  idea  as 
well.  The  importance  of  the  conference  has  been  underlined  by  participation  and  support  of 

the  Ministry  of  Education  of  the  Slovak  Republic. 

Technical  co-sponsorship  of  the  IEEE  Computer  Society,  Test  Technology  Technical 
Council,  as  well  as  the  support  of  IEE  through  its  Slovak  Centre  are  appreciated.  The  support 

of  the  cosponsoring  organisations  contributes  to  the  quality  of  the  event  as  well. 

We  wish  to  thank  the  European  Office  of  Aerospace  Research  and  Development,  Air 
Force  Office  of  Scientific  Research,  United  States  Air  Force  Research  Laboratory  for  their 

contribution  to  the  success  of  this  conference. 

The  editors  are  very  grateful  to  everybody  who  helped  to  bring  this  conference  to 
fruition,  especially  the  conference  secretariat,  the  members  of  the  program  and  organising 
committees,  and  all  those  who  accepted  invitations  to  present  talks  or  who  submitted  papers 


for  consideration. 

On  behalf  of  the  Organising  Committee  we  would  like  to  wish  all  of  you  a  great  deal  ot 
success  in  your  scientific  discussions  and  a  pleasant  and  memorable  stay  in  Bratislava,  the 
capital  of  the  Slovak  Republic. 


Viera  Stopjakova 
Organising  Committee  Chairman 


Daniel  Donoval 
General  Chairman 
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System-on-Chip:  Test  and  Diagnosis 

Yervant  Zorian,  Ph.D, 

Chief  Technology  Advisor, 

LogicVision,  Inc. 

101  Metro  Dr.,  Third  Floor 
San  Jose,  CA  95110,  USA 
zorian@Iogicvision.com 

Abstract:  As  system-on-chip  (SOC)  complexity  and  the  move  to  very  deep  submicron  (VDSM) 
technology  pushes  the  threshold  of  semiconductor  technology,  conventional  test  methods 
become  inadequate  and  costly.  This  new  level  of  complexity  demands  that  designers  alter  the 
way  they  approach  chip  development  in  order  to  keep  up  with  diminishing  time-to-market 
requirements  and  stay  within  budgets.  Embedded  test  enables  customers  to  produce  higher- 
quality  products  in  less  time.  The  use  of  embedded  test  raises  margins  and  significantly  reduces 
the  time  required  for  system  verification,  test  and  debug.  The  speaker  will  address  chip-  and 
board-  level  signal  integrity  issues,  system  architecture  design,  business  (time  to  market), 
embedded  systems  (design  considerations  for  embedded  systems,  testing  real-time  systems, 
systems  integration),  test  (high-density  design  issues,  mixed-signal  testing,  digital  testing  issues, 
test  technologies  -  IDDQ,  SCAN,  design  for  testability),  SOC  integration  /test  issues  —  making 
SOC  a  reality,  and  the  importance  of  embedded  test  and  front-end  (time  to  money,  quality  and 
cost). 

1.  Introduction: 

The  electronics  revolution  we  are  witnessing  today  is  driven  by  market  demands  to  provide 
better,  cheaper,  smaller  and  faster  products  while  meeting  users'  quality  requirements.  Meeting 
these  quality  requirements  necessitates  performing  adequate  test  and  diagnosis  procedures.  This 
article  concentrates  on  the  dynamic  changes  in  the  electronics  industry  and  its  impact  on  test  and 
diagnosis. 

The  market  driven  electronics  industry  keeps  introducing  products  with  greater  functionality, 
higher  reliability,  lower  costs  and  shorter  product  realization  intervals.  These  are  realized  by  the 
unprecedented  advancements  in  semiconductor  IC  technology.  Semiconductor  ICs  are 
considered  the  foundation  of  modern  products,  even  traditionally  the  non-electronic  ones. 
Semiconductor  transistors  are  becoming  so  cheap  and  commonly  available  that  whole  industries 
now  live  on  continuously  integrating  more  and  more  functions  into  smaller  and  smaller 
packages,  hence  creating  system-on-chips.  Being  able  to  rapidly  develop,  manufacture,  test, 
diagnose  and  verify  complex  new  chips  and  products  using  such  chips  is  crucial  for  the 
continued  success  of  our  economy  at-iarge.  This  growth  is  expected  to  continue  full  force  at 
least  for  the  next  decade,  while  making  possible  the  production  of  100  million  transistor  chips. 
However,  to  make  its  production  practical  and  cost  effective,  the  National  Technology  Roadmap 
for  Semiconductors  identified  in  1997  a  number  of  major  hurdles  to  be  overcome.  Some  of  these 
hurdles  are  related  to  test  technology.  This  article  analyzes  these  hurdles,  relates  them  to  the 
advancements  in  semiconductor  technology  and  presents  potential  solutions  to  address  them. 
These  solutions  are  meant  to  ensure  that  test  contributes  to  the  overall  growth  of  the 
semiconductor  industry  and  does  not  slow  it  down. 
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Test  is  a  critical  technology  in  the  semiconductor  production  process.  On  the  one  hand,  IC  test  is 
performed  multiple  times  during  volume  production  to  screen  the  ICs  upon  their  manufacturing. 
IC  test  starts  with  wafer  probing  even  before  patterned  wafers  are  diced  and  goes  into 
individually  packaged  chips.  On  the  other  hand,  IC  test  also  plays  a  key  role  in  analyzing  defects 
in  the  semiconductor  manufacturing  process.  The  feedback  derived  from  the  test  is  the  only  way 
to  analyze  and  isolate  many  of  the  defects  in  today's  processes.  Time-to-yield,  time-to-market, 
time-to-quality  are  all  gated  by  test.  Moreover,  ICs  are  tested  at  each  additional  manufacturing 
step  beyond  IC  production  because  each  step  can  introduce  new  defects.  With  the  increasing 
needs  for  high  quality  electronic  products,  at  each  new  physical  assembly  level,  such  as  board 
and  system  assembly,  IC  test  is  used  for  debugging,  diagnosing  and  repairing  the  sub-assemblies 
in  their  new  environment.  Similarly  with  the  increasing  reliability,  availability  and  serviceability 
requirements,  most  users  of  high-end  product  perform  periodic  tests  in  the  field  throughout  the 
full  life  cycle.  As  the  semiconductor  technology  keeps  moving  towards  the  creation  of  monster 
chips,  we  will  continue  to  confront  the  key  scaling  trends:  greater  complexities,  increased 
performance,  and  higher  densities . 

To  allow  advancements  in  each  one  of  the  above  four  scaling  trends,  fundamental  changes  are 
expected  to  emerge  in  different  IC  realization  disciplines  such  as  IC  design,  packaging  and 
silicon  process.  These  changes  have  a  direct  impact  on  the  test  methods,  tool  and  equipment 
adopted.  Test  must  keep  up  with  the  pace  of  such  changes  to  ensure  that  100  million  transistor 
monster  chips  adequately  tested,  diagnosed,  measured,  debugged  and  even  sometimes  repaired. 

In  the  following,  we  will  take  the  three  key  scaling  trends  one  at  a  time,  observe  their 
implications  on  test,  identify  the  key  hurdles/challenges  and  discuss  the  potential  solutions. 
Each  of  the  three  challenges  our  ability  to  efficiently  create  new  products.  It  is  not  sufficient  to 
address  just  one  of  the  challenges;  all  must  be  met  at  the  same  time. 

2.  Implications  of  Increased  Complexity: 

Moore's  law  predicts  how  the  achievable  transistor  count  per  chip  grows  over  time.  The 
Semiconductor  Industry  Association's  (SIA's)  National  Technology  Roadmap  of 
Semiconductors  lay  out  the  consequences  of  that  prediction.  The  growth  rate  in  IC  transistor 
count  is  far  higher  than  the  rate  for  IC  pins.  The  drastic  increase  in  the  ratio  of  transistor  per  pin 
continuously  reduces  the  accessibility  to  the  transistors  from  chip  pins.  The  limited  accessibility 
to  internal  transistors  is  a  big  problem  for  IC  test. 

The  problem  associated  with  limited  Input/Outputs  (I/O's)  that  have  to  be  overcome  do  not  end 
with  access  difficulties.  There  is  a  growing  disparity  between  internal  clock  frequencies  and  the 
output  capability  of  I/Os.  This  I/O  limitation  makes  at-speed  performance  testing  of  an  IC  very 
difficult  if  not  impossible.  Combining  the  roadmap  numbers  for  transistor  count,  chip  I/O  count, 
cost,  chip  internal  frequency  and  I/O  switching  speeds  reveals  that,  the  rate  of  growth  for  how 
much  information  can  be  generated  and  consumed  inside  a  chip  (internal  bandwidth,  defined  as 
number  of  transistors  per  IC  times  internal  switching  frequency)  outpaces  by  far  the  rate  at 
which  the  available  I/O  bandwidth  grows  (number  of  I/Os  times  I/O  switching  speed),  see  Fig 
(1).  At  the  same  time  the  cost  of  package  pins  declines  much  more  slowly  than  the  cost  of  a 
transistor.  The  physical  characteristics  of  I/Os  must  remain  in  macroscopic  level  dictated  by 
chip  attachment  and  board  manufacturing  constraints;  whereas  the  silicon  feature  sizes  are 
rapidly  moving  down  from  a  micrometer  to  nanometer.  In  other  words,  the  chip  I/Os  and  board- 
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level  interfaces  don’t  scale  physically  at  nearly  the  rate  of  the  internal  circuits,  contributing  to  a 
growing  number  of  transistors  behind  each  chip  I/O  and  a  widening  performance  gap  between 
the  chip  internals  and  the  I/O  interface. 

The  data  from  SIA  Roadmaps  reveals  that  the  gap  between  external  bandwidth  and  internal 
bandwidth  will  grow  in  a  fast  pace,  see  Fig  (1).  This  bandwidth  gap  is  the  main  reason  why  we 
are  starting  to  see  processors  and  DRAMs  being  integrated  into  the  same  chip,  rather  than 
interacting  with  each  other  through  the  limited  bandwidth  of  chip  I/Os,  as  it  was  done 
traditionally  when  the  bandwidth  gap  was  negligible. 

The  I/O  bandwidth  had  a  major  impact  on  chip  test  methods.  In  the  very  early  days  of  IC 
technology  (100  transistors  per  chip),  the  bandwidth  gap  was  negligible.  The  test  data  was 
applied  onto  the  chip  I/Os  directly  from  an  external  test  data  Source,  and  the  response  data 
received  from  the  chip  I/Os  and  evaluated  for  its  correctness  by  an  external  Sink,  see  Fig  (3-a). 
The  combination  of  external  Source  and  Sink,  the  test  control  software  embedded  in  it,  and  the 
external  test  access  mechanism  (connecting  the  IC  pins  to  the  Source/Sink)  represent  the 
external  test  equipment. 


Internal 

External 


Figure  (1)  External  vs.  Internal  Input  /  Output  Bandwidth 


With  the  next  generation  of  IC  technology,  the  transistor  per  pin  ratio  reached  a  level  where  the 
resulting  IC  complexity  (10,000  transistors  per  IC)  made  the  sole  dependence  on  external  test 
equipment  insufficient.  Hence,  the  concept  of  embedded  test  was  introduced.  This  meant 
embedding  test  capabilities  beyond  the  primary  I/Os  and  into  the  internal  transistors  of  the  chip. 
One  of  the  early  embedded  test  techniques  was  scan  path,  which  reduced  the  test  complexity  by 
extending  the  test  access  mechanism  into  the  internal  transistors,  see  Fig.  (3-b).  The  process  of 
incorporating  embedded  test  circuits  into  the  design  of  a  chip  became  known  as  design-for- 
testability.  Embedded  test  hardware,  namely  scan  paths  in  this  case,  facilitated  the  transport  of 
test  data  from  the  chip  I/Os  and  applied  it  into  a  large  number  of  internal  transistors. 
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The  amount  of  test  data  needed  for  testing  an  IC  with  a  certain  level  of  fault  coverage  grows 
with  the  transistor  count  of  the  IC  under  test.  For  the  embedded  test  based  on  scan  paths  the 
growth  rate  is  proportional  to  the  growth  in  transistor  count.  When  the  external  Source  and  Sink 
is  providing  all  stimulus  and  response  data  through  the  chip  I/Os,  the  pin  buffer  depth  to  apply 
the  test  need  to  grow  with  the  number  of  transistors  per  pin.  With  complexities  beyond  a  million 
transistors  per  IC,  the  pin  buffer  depth  started  to  become  a  major  concern.  Moreover,  with  this 
same  IC  generation  the  external  test  equipment  has  started  to  confront  another  key  hurdle,  the 
drastic  increase  in  silicon  speed.  In  order  to  test  an  IC  at  its  system  speed,  a  semiconductor 
producer  either  had  to  stay  with  the  existing  approach  of  having  external  Source  and  Sink,  but 
this  necessitates  high-bandwidth  test  interaction  between  the  IC  and  its  external  Source  and 
Sink,  as  in  Fig,  (3-c).  Often  external  test  equipment  were  unable  of  performing  this  capability; 
or  introduce  a  low-bandwidth  communication  with  the  IC  by  shifting  certain  feature  from  the 
external  Source  and  Sink  to  embedded  Source  and  Sink.  These  shifted  features  were  the  test 
speed  and  data  volume  oriented  ones  Fig  (3-d).  The  Source  and  Sink  features  that  required  low- 
bandwidth  interaction  remained  in  the  external  test  equipment.  This  drastically  reduced  the 
complexity  of  external  tester. 


External  Test 
Fig.  3  -  a 


External  Test  Embedded  Test 

Fig.  3  -  b 


External  Test  Embedded  Test 

Fig.  3  -  c 


External  Test  Embedded  Test 

Fig.  3  -  d 


Figure  (3)  External  and  Embedded  Test 

As  in  Fig  (3-d),  the  embedded  Source  performs  an  expansion  function  generating  an  at-speed 
and  large  test  data  volume  applies  it  on  the  circuit  under  test;  whereas,  the  embedded  Sink 
collects  the  response  data  and  performs  an  at-speed  compaction  function.  This  results  in 
reducing  the  complexity  of  the  external  Source  and  Sink  and  allowing  at-speed  performance 
test.  Without  such  a  scheme  it  may  have  been  impossible,  since  external  test  equipment  is  often 
built  using  yesterday's  technology  and  would  result  in  slow  test  throughput  with  long  scan  paths. 
While  embedded  Source  and  Sink  help  avoid  the  bandwidth  limitation  of  the  external  test 
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equipment,  but  they  do  not  perform  total  Built-In  Self-Test  (BIST),  since  they  still  depend  on 
the  external  test  equipment. 

As  the  semiconductor  technology  moves  to  chips  with  over  100  million  transistors,  naturally  the 
bandwidth  gap  between  external  test  and  embedded  test  will  grow  to  levels,  where  more  high- 
bandwidth  test  functions  will  need  to  migrate  on-chip.  This  natural  evolution  of  embedded  test 
will  result  in  a  new  partitioning  of  functions  between  external  and  embedded  test.  This 
partitioning  will  be  an  ongoing  process  of  shifts  from  the  functions  of  external  test  to  the 
embedded  one  maintaining  the  two  components  as  complementary  test  segments. 

Mixing  Technologies 

Another  implication  of  high  complexity  is  mixing  circuit  types  on  a  single  IC.  Monster  chips  are 
expected  to  comprise  of  non-homogeneous  types  of  circuits.  Today's  complex  chips  have 
already  started  to  mix  diverse  circuits,  such  as  digital  logic,  embedded  DRAM  and  analog 
blocks  into  a  single  IC.  As  chip  integration  technologies  continue,  more  advanced  circuits  will 
be  added  to  this  list,  such  as  embedded  FPGA,  Flash,  RF/Microwave,  and  may  even  move 
beyond  the  electronics  domain  to  contain  micro-electromechanical  (MEMS)  and  optical 
elements. 

Different  types  of  circuits  exhibit  distinct  defect  behavior  and  require  different  test  solutions. 
Each  type  require  different  test  Sources  to  generate  the  test  data  and  Sinks  to  compare  the 
responses.  Typically,  distinct  external  test  equipment  is  used  for  each  type  of  circuit.  For 
example  one  tester  for  logic  testing,  one  for  embedded  memory  test  and  another  for  analog.  The 
use  of  three  external  testers  to  test  a  single  chip  is  termed  as  triple  insertion,  See  Fig.  (4-a)  and  is 
considered  an  expensive  proposition.  An  alternate  solution  offered  by  test  equipment  vendors  is 
to  use  a  "Super"  tester,  which  combines  the  test  capabilities  of  all  three  above  testers.  The  super 
testers  do  not  assume  embedded  test  capabilities  in  the  chip  under  test,  hence  they  tend  to 
contain  all  test  features  and  therefore  turn  this  solution  to  an  extremely  expensive  one. 


External  Tester  Embedded  Tester 


Fig.  4-a 


Externa!  Tester  Embedded  Tester 


Fig.  4-b 


Figure  4  Mixing  Technologies  in  a  Single  Chip 
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An  easier  and  more  cost  effective  way  to  handle  these  mixed  circuit  chips  is  by  inserting 
embedded  hardware  Sources  and  Sinks  corresponding  to  each  circuit  type,  for  example  an 
embedded  Source/Sink  for  digital  logic,  another  for  memories  and  a  third  for  the  FPGA  circuit. 
Such  an  IC  will  not  require  more  than  a  single,  existing  and  lower-cost  external  tester,  See  rig. 

(4-b). 


System-on-Chip  Test 

Embedded  core-based  system-on-chip  (SOC)  design  implies  the  reuse  of  pre-designed  complex 
functional  blocks,  also  called  Virtual  Components,  Intellectual  Property  (IP).  These  embedded 
cores  can  come  with  different  degrees  of  readiness  for  reuse  in  SOC  design,  from  different 
sources,  and  are  designed  for  use  in  a  multiplicity  of  different  SOCs.  Being  pre-designed,  an 
embedded  core  may  not  only  originate  in  a  different  organization,  but  it  is  also  developed  at  a 
different  time  than  the  SOC  that  will  use  it.  The  embedded  core  design  must  be  able  to 
anticipate  the  desired  SOC-level  test  constraints  for  all  target  SOC  designs.  Further,  it  must  be 
possible  to  package  the  results  of  any  enabled  core-test  in  a  form  that  is  compatible  with  the  test 
methodology  contexts,  and  with  the  test-development  tools  available  to  the  SOC  designers  who 
wish  to  reuse  the  core. 

Core  designs  need  to  be  more  test-friendly  to  simplify  the  SOC  integration  task,  while  giving 
SOC  designers  more  flexibility  in  choosing  the  best  overall  test  methodologies  for  their  chips. 
To  ensure  the  test-friendliness  and  interoperability  of  cores  from  diverse  sources,  a  standard  for 
embedded  core  test  in  under  development,  namely  IEEE  PI 500.  The  standard  does  not 
standardize  a  core's  internal  test  methods  or  chip-level  test  access  configuration.  The 
standardization  effort  concentrates  on: 

-  a  standardized  core  test  language  (CTL),  capable  of  expressing  all  test-related  information  to 
be  transferred  from  core  provider  to  core  user;  and 

-  a  standardized  -  but  configurable  and  scalable  -  core  test  wrapper,  which  allows  easy  test 
access  of  the  core  in  a  system  chip  design.  The  standard  core  test  wrapper  interfaces  with  an 
on-chip  test  access  mechanism  and  may  operates  under  several  test  modes  (such  as,  internal, 
external,  diagnosis,  etc). 

While  it  is  possible  to  route  the  test  access  mechanism  to  the  I/Os  of  the  chips  in  order 
receive/transmit  the  test  patterns  from/to  external  test  equipment,  but  it  more  practical  and  cost 
effective  to  use  on  chip  test  Sources  and  Sinks.  They  may  be  realized  in  two  scenarios,  either  the 
embedded  core  would  have  a  dedicated  Source  and  Sink  to  perform  its  self-test;  the  test  access 
mechanism  connected  to  this  core  may  obtain  connect  to  a  Source  and  Sink  at  the  SOC  or  any 
other  intermediate  level.  This  is  mainly  meant  for  reusing  the  Source  and  Sink  for  more  than  one 
embedded  core. 

The  most  used  cores  today  are  the  embedded  memories.  These  cores  have  widely  accepted  the 
embedded  Source  and  Sink  approach.  Most  chip  manufacturers  have  adopted  memory  BIST 
generation  tools.  As  the  monster  chips  incorporate  more  complex  and  larger  numbers  ot 
embedded  cores,  such  as  microprocessors,  analog  blocks,  and  DSPs,  the  embedded  Source  and 
Sink  approach  need  to  be  extended  as  the  test  solution  of  the  other  cores  in  an  SOC. 
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Figure  (5)  System-on-Chip  Test 

The  monster  chips  are  expected  to  embed  very  dense  memories  of  large  sizes  (256K  -  64M  bits). 
These  dense  memories  may  include:  SRAMs,  DRAMs  and/or  Flash  memories.  For  more  than  a 
decade,  the  smaller  scale  memories  have  been  embedded  in  mostly  logic  chips  and  became  an 
integral  part  of  the  ASIC  libraries.  These  memories  were  among  the  first  to  use  on-chip  Sources 
and  Sinks.  This  is  utilized  during  the  manufacturing  test  to  avoid  using  a  dedicated  external 
memory  tester,  in  addition  to  the  external  logic  tester  used  for  the  rest  of  ASIC.  Beyond  a 
certain  size  such  as  256K  bits,  memories  necessitate  redundancy  and  repair  during 
manufacturing  test.  This  has  been  performed  regularly  for  large  stand-alone  memory.  This  is 
typically  a  fuse  blow  process  using  external  laser  repair  equipment. 

Due  to  the  large  sizes  of  the  its  embedded  memories,  the  monster  chip  needs  to  have  redundant 
rows  and  columns  to  help  reconfigure  it,  if  there  were  faulty  cells  detected.  For  the  same  reasons 
as  for  the  smaller  memories,  these  will  rely  on  embedded  Sources  and  Sinks  to  generate  and 
evaluate  the  test  data.  Moreover,  since  the  memory  response  data  is  evaluated  by  the  embedded 
Sink,  the  role  of  this  Sink  could  be  slightly  expanded  in  order  to  perform  diagnosis  of  the  failed 
bits.  Furthermore,  to  avoid  sending  a  large  failed  bit  map  to  the  external  test  equipment  via 
limited  I/O  bandwidth,  the  embedded  Sink  can  be  expanded  further  to  perform  built-in 
redundancy  analysis  in  order  to  identify  the  actual  rows  and  columns  needed  for  reconfiguration. 
In  this  case,  only  the  repair  list  can  be  communicated  to  the  external  tester  and  hence  the  laser 
repair  equipment  can  perform  a  hard  repair. 

The  final  augmentation  of  the  embedded  Source  and  Sink  is  to  make  the  memory  self- 
repairable.  This  is  motivated  by  the  fact  that  laser  repair  is  often  very  expensive  and  some  times 
continuous  periodic  field  repair  is  desired.  This  will  be  achieved  by  expanding  the  embedded 
test  resources  even  further  to  include  a  storage  repair  data  and  a  soft  reconfiguration  mechanism. 
In  summary,  embedded  test  for  very  large  memories  may  by  required  to  move  beyond  fault 
detection  to  include  failed  bit  diagnosis,  redundancy  analysis  and  self-repair. 
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3.  Implications  of  Increased  Performance: 

With  the  continuous  increase  in  IC  internal  speed,  performance-related  defect  coverage  has 
become  increasingly  important.  The  recent  Sematech  experiments  have  confirmed  the  criticality 
of  performance-related  tests.  The  100  million  transistor  monster  chip  will  necessitate  a 
comprehensive  performance  test.  Moreover,  it  has  been  predicted  that  Iddq  test  will  lose  its 
effectiveness  for  such  chips  due  to  higher  sub-threshold  currents.  Most  Iddq  failures  will 
probably  be  observed  also  as  timing/performance  anomalies. 

A  performance  test  that  is  applied  from  an  external  test  equipment  can  not  adequately  and  cost 
effectively  test  the  high  clock  speeds  and  provide  the  necessary  performance-related  defect 
coverage.  Because  these  equipment  are  typically  made  of  older  technology  compared  to  the 
chips  they  test,  and  the  higher  speed  test  equipment  are  substantially  more  expensive.  The  S1A 
roadmap  indicates  that  major  yield  losses  and  cost  increases  are  related  to  the  slower  growth  of 
external  test  equipment  speeds  versus  the  ever  improving  internal  chip  speed.  While  the  external 
tester  accuracy  has  improved  at  a  rate  of  12%  per  year,  internal  chip  speeds  have  improved  at 
30%  per  year.  Typical  headroom  of  external  testers  five  times  faster  than  internal  chip  speeds 
have  all  but  disappeared.  With  the  current  trend,  cycle  time  of  manufactured  chips  will  approach 
external  tester  timing  accuracy,  see  Fig.  (6),  in  less  than  ten  years.  A  crossover  may  occur  near 
2010,  but  by  2001,  yield  losses  due  to  external  test  inaccuracy  will  be  unacceptable. 


'  Figure  (6)  Test  Accuracy  and  Yield 

If  external  test  accuracy  cannot  keep  up  with  the  internal  chip  speed,  our  monster  chip  need  to 
leverage  the  internal  speed  of  its  silicon  and  utilize  dedicated  embedded  test  resource  for  the 
tests  that  require  timing  precision.  Because  it  is  built  on  the  same  piece  of  silicon  as  the  monster 
chip,  an  embedded  test  resource  will  have  a  cycle  time  comparable  to  the  chip  internal  speed. 
Therefore,  it  will  allow  accurate  performance-related  tests  and  precision  measurements,  and 
eliminate  the  potential  yield  losses  predicated  by  the  SIA  roadmaps. 
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4.  Implications  of  Higher  Density: 

The  continuous  advancement  in  semiconductor  technology  will  keep  increasing  the  silicon 
density.  The  number  of  millions  of  transistors  per  sq.  cm.  will  increase  by  a  factor  of  three  in  the 
next  five  years.  The  density  level  resulted  in  a  monster  chip  will  have  a  number  of  test-related 
implications. 

According  to  the  SIA  roadmaps,  the  increase  in  semiconductor  density  causes  reduction  in 
defect  sizes.  The  complexity  of  monster  chips  on  the  one  hand,  and  the  reduction  in  their  defect 
sizes  on  the  other,  cause  a  drastic  increase  in  the  difficulty  in  fault  localization.  The  difficulty  in 
localizing  faults  increases  one  order  of  magnitude  every  six  years. 

The  best  tool  to  perform  fault  localization  and  defect  analysis  in  semiconductor  manufacturing 
is  the  test  process.  The  feedback  loop  derived  from  the  test  process  is  the  only  way  to  analyze 
and  isolate  many  of  the  defects  in  manufacturing.  Understanding  failure  mechanisms  and 
providing  corrective  actions  cannot  occur  without  the  ability  to  localize  faults  to  an  area  that  can 
be  inspected  in  a  practical  and  cost-effective  manner. 

The  increased  density  in  monster  chips  will  severely  challenge  the  physical  fault  localization 
processes.  The  hardware  for  embedded  test  scales  with  the  chip  itself.  The  existing  embedded 
test  resources  whether  in  embedded  memories,  cores,  user  defined  logic  or  analog  blocks  can  act 
as  the  infrastructure  to  collect  the  faulty  data  from  the  block  under  test.  This  helps  to  quickly 
isolate  faults.  Leveraging  the  embedded  test  hardware  resources  can  aid  in  defect  isolation.  The 
failure  pattern  must  map  to  a  physical  location  on  a  circuit.  Software-based  test  tools  compatible 
with  major  embedded  test  methodologies  (such  as,  scan  or  BIST)  are  needed  to  save  failure 
pattern  information  so  that  it  can  be  analyzed  based  on  predetermined  failure  mode  information. 
This  allows  yield  engineers  to  more  quickly  and  precisely  determine  the  location  and  causes  for 
circuit  failures. 

Collecting  more  parametric  data  as  measured  on  external  test  equipment  will  aid  in  sourcing 
unmodeled  defect  types.  The  continuous  increase  in  density  and  in  multi-level  metal  layers 
result  in  new  fault  models,  which  make  the  traditional  stuck-at  fault  less  effective.  Recent 
studies  show  that  few  large-area  spot  defects  are  causing  single  defects,  which  affect  multiple 
transistors  and  gates  simultaneously.  In  general,  to  reduce  the  test  escapes  due  to  such 
unmodeled  faults,  pseudo-random  test  patterns  are  used.  A  number  of  experimental  studies 
showed  the  effectiveness  of  such  pseudo-random  patterns  in  reducing  test  escapes. 

Signal  integrity  and  electromagnetic  phenomena  will  become  an  increasingly  important  test 
issue  with  the  appearance  of  monster  chips.  New  fault  models  including  soft  error  models  that 
incorporate  the  effects  of  electromagnetic  fields  need  to  be  developed.  The  increase  in  soft 
errors  is  due  to  the  reduction  in  device  size  and  voltage  supply,  which  cause  an  increase  in  noise 
sensitivity  and  an  increase  in  susceptibility  to  cosmic  variations,  such  as  alpha  particle  radiation. 
The  resulting  soft  errors,  while  not  new  for  space  oriented  applications,  will  start  to  cause 
considerable  risks  on  see  level  altitude.  Due  to  their  transient  nature,  soft  errors  need  to  be 
continuously  monitored  during  the  field  operation  of  a  chip.  Hence,  this  will  require  dedicated 
hardware  for  embedded  test,  which  performs  on-line  testing  technique. 
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New  embedded  sensors  to  monitor  different  on-chip  parameters  should  be  developed  to  identify 
race  conditions  and  other  failure  modes  that  are  functions  of  parametric  variations.  These 
sensors  will  be  an  integral  part  of  the  embedded  test  infrastructure  and  leverage  the  existing  chip 
level  test  data  and  control  mechanisms.  Handling  the  test  data  through  signature  analysis 
techniques  would  significantly  reduce  the  need  for  hardware  failure  analysis.  In  fact,  the  more 
embedded  test  monitoring  and  on-chip  data  acquisition  resources  are  used  the  less  hardware 
diagnostics  is  required  for  silicon  debug  and  failure  analysis.  Especially  with  the  high  density 
packages,  such  as  flip-chips,  hardware  diagnosis  will  become  more  and  more  constrained. 
Because  the  conventional  electron  beam  or  thermal  imaging  technique  will  not  apply  to  flip- 
chip  attach  technologies  due  to  the  fact  that  there  is  no  front  side  accessibility.  Very  limited 
back  side  accessibility  hardware  diagnosis  techniques  can  be  used,  such  photon  emission  and 
scanned  lasers,  if  defects  are  sensitive  to  them.  New  back  side  accessibility  tools  and  solutions 
are  needed. 

In  addition  to  fault  localization  and  failure  analysis,  producing  the  monster  chip  requires 
integrated  yield  analysis  capabilities  that  make  use  of  the  defect  and  failure  analysis  data.  These 
capabilities  need  be  in  software  tools  to  automatically  access  multiple  databases  and  establish 
correlation  between  data  of  different  types.  Some  data  sources  are  time-based,  others  are  chip- 
based  or  wafer-based.  Automated  data  reduction  algorithms  to  source  defects  from  multiple  data 
sources  must  be  developed  to  reduce  defect  sourcing  time.  The  SIA  roadmap  identified  this  as 
one  of  the  key  requirements  for  yield  learning  and  improvement. 

5.  Implications  of  Reduced  Cost: 

Increasing  cost  of  capital  for  external  test  equipment  is  one  of  three  major  test-related  problems 
the  SIA  roadmaps  had  predicted.  While  the  cost  per  pin  for  external  test  has  remained 
essentially  flat  for  the  past  20  years  at  around  $1  OK/pin.  The  demands  for  higher  speed,  greater 
accuracy,  more  time  sets  and  increased  data  volume  offset  all  the  goals  in  cost  reduction  seen 
for  improving  external  test  equipment  cost.  In  its  1 997  roadmaps,  SIA  indicates  that  tester  cost 
will  reach  $20M  in  year  2010,  unless  there  is  change  in  IC  design  incorporating  more  embedded 
test.  It  also  predicts  that  by  the  same  year,  it  may  cost  more  to  test  a  transistor  than  it  costs  to 
manufacture  the  transistor. 

The  cost  of  embedded  test  is  in  terms  of  additional  silicon  needed  to  incorporate  the  test 
functions  and  a  limited  impact  on  yield.  A  set  of  embedded  Sources  and  Sinks  for  a  state-of-the- 
art  mixed  circuit  IC  is  about  10,000  logic  gate  equivalent,  assuming  that  a  scan  infrastructure  is 
already  in  place.  For  designs  of  about  400,000  to  500,000  gates,  the  relative  silicon  cost  of 
embedded  test  is  equals  the  cost  of  external  test.  For  chips  above  this  size,  the  silicon  investment 
constitutes  less  than  1  percent  of  the  silicon  manufacturing  costs,  See  Fig.  (7).  As  for  the  yield 
impact,  the  Embedded  Sources  and  Sinks  increase  the  silicon  area  and  hence  can  reduce 
manufacturing  yield.  Because,  the  bigger  the  chip  area,  the  more  chance  of  a  particle  falling  on 
that  chip  and  causing  a  defect.  However,  with  the  addition  of  10,000  gates  on  a  monster  chip  of 
tens  of  Million  of  gates  causes  a  negligible  impact. 


II 


Silicon  Manufacturing 


SOC  chip  capacity  (equivalent  logic  gates) 


Figure  (7)  External  vs.  Embedded  Test  Cost 

After  it  is  successfully  produced,  a  monster  chip  is  deemed  to  become  an  integral  part  of  a  larger 
electronic  system.  The  cost  of  test  and  diagnosis  for  such  electronic  systems,  typically,  reaches 
40-50%  of  total  product  realization  cost.  If  embedded  test  resources  were  incorporated  into  the 
chip,  these  resources  can  be  reused  hierarchically  during  board  and  system  manufacturing.  This 
reduces  the  time  and  cost  required  to  develop  diagnostic  firmware  and  interfaces  for  board  and 
system  test  and  maintenance  features.  The  concept  of  embedded  test  can  be  applied  to  every 
electronic  assembly  level,  for  instance,  using  chip  level  embedded  test  during  board  test;  using 
board  level  embedded  test  during  system/box  test;  and  hardware/software  use  of  the  total 
embedded  solution  in  the  field  test. 

6.  Conclusions: 

The  semiconductor  scaling  trends  such  as  complexity,  performance  and  density  have  major 
implications  on  testing  the  100  million  transistor  chips.  The  industry  has  identified  a  number  of 
hurdles  as  a  result  of  such  implications.  Solving  these  hurdles  require  a  new  approach  to  testing, 
where  the  test  functions  are  partitioned  to  two  main  components:  embedded  test  and  external 
test.  The  two  complementary  test  components  are  needed,  but  the  balance  of  functions  in  each 
component  depends  on  numerous  technological  and  economical  factors. 

The  usage  of  embedded  test  hardware,  which  is  invested  on-chip,  goes  beyond  plain  chip  test. 
This  hardware  provides  a  number  of  crucial  test-related  functions,  such  diagnosis,  measurement, 
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debug,  failure  analysis,  and  even  repair.  Also,  the  embedded  test  hardware  is  reused  from  cores 
test  level,  to  chip,  to  board  and  system  test  levels. 

External  test  and  embedded  test  will  be  used  simultaneously  on  most  chip  designs.  As  chips 
become  more  complex  new  functions  will  be  transfer  from  the  external  test  component  to  the 

embedded  one. 
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Abstract.  In  this  paper  we  propose  the  design  of  an  easily  testable,  with 
respect  to  path  delay  faults,  n  x  m  carry  -  save  multiplier  (CSM)  and  give  a  path 
selection  method  such  that  all  the  selected  paths  for  testing  arc  Single  Path 
Propagating  Hazard  Free  Robustly  Testable  (SPP-HFRT).  Only  three  additional 
test  inputs  are  required  while  the  hardware  overhead  is  very  small  and  the  delay 
overhead  negligible. 

I.  Introduction 

Increasing  performance  requirements  makes  difficult  to  design  VLSI  circuits  with  large 
timing  margins.  Hence,  due  to  imprecise  delay  modeling,  the  statistical  variations  of  the 
parameters  during  the  manufacturing  process  as  well  as  the  occurrence  of  physical  defects  in 
the  integrated  circuits  the  performance  of  the  manufactured  circuits  may  be  less  than  the 
expected  one,  while  its  logic  function  is  correct.  The  path  delay  fault  model  is  the  most 
general  delay  fault  model  used  to  model  changes  in  the  timing  behavior  of  a  circuit.  Under  its 
use  a  path  is  declared  faulty  if  it  fails  to  propagate  a  transition  from  the  path  input  to  the  path 
output  within  a  specified  time  interval  [1],  There  are  two  major  problems  associated  with  path 
delay  fault  testing  :  a)  the  large  number  of  physical  paths  and  b)  their  robustness.  Selection  of 
paths  for  testing  is  especially  difficult  in  performance  optimized  designs  because  they  often 
have  a  large  number  of  paths  with  long  propagation  delays.  Moreover,  a  physical  defect  may 
increase  the  delay  along  a  non-critical  path  so  that  it  eventually  becomes  a  critical  path. 

It  has  been  shown  in  [2]  that  by  measuring  the  delays  along  a  suitable  very  small  set  R 
of  physical  paths  the  propagation  delays  along  any  other  path  can  be  calculated.  However,  to 
be  able  to  measure  the  propagation  delay  along  the  R  paths  they  must  be  SPP-HFRT  [2]. 
Unfortunately  for  most  circuits,  among  them  the  CSM  as  well,  a  suitable  set  of  SPP-HFRT 
paths  does  not  exist.  In  this  paper  modifications  of  the  CSM  are  proposed  leading  to  a  suitable 
set  R’  of  SPP-HFRT  paths.  We  give  a  method  to  derive  the  set  R'  of  SPP-HFRT  paths  and  we 
show  that  it  is  an  impressively  small  percentage  of  all  physical  paths.  By  measuring  the  delay 
along  the  paths  of  R\  the  delay  along  any  other  path  can  be  calculated.  The  delay  overhead 
due  to  the  modifications  is  negligible  while  the  hardware  overhead  for  practical  size  CSMs  is 
very  small. 
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II.  CSM  Modifications 

An  n  x  m  CSM  is  a  circuit  with  inputs  (Ai,  A2,  ...  A„)  and  03 1,  B2, ...  Bm)  and  outputs 
(01  02,  ...  On  +  m).  Figure  1.  presents  the  4  x  4  carry  save  multiplier.  We  consider  that  the 
multiplier  consists  of  two  blocks.  The  first  block  Do  consists  of  the  network  of  carry  save 
adders  and  the  associated  logic.  The  second  block  D,  is  an  (n-l)-bit  adder,  which  can  be 
implemented  as  a  ripple  carry  or  group  carry  look  ahead  adder. 

In  [3]  we  have  shown  that  using  multiplexers  for  making  the  inputs  and  outputs  of  the 
embedded  blocks  accessible  by  the  primary  ports  of  the  circuit,  the  path  delay  fault  testing  of 
the  circuit  is  reduced  to  the  path  delay  fault  testing  of  the  blocks  that  constitute  it.  By  adding 
multiplexers  in  the  original  CSM  design  (Figure  2),  we  can  manipulate  the  two  blocks.  Do  and 
Di,  individually.  We  will  hereafter  deal  only  with  the  path  delay  fault  testing  of  Do,  since 
efficient  path  delay  fault  testing  techniques  of  both  ripple-carry  and  carry  look  ahead 
implementations  of  Di  have  been  presented  in  [4].  A  suitable  set  R"  of  SPP-HFRT  paths  of  D0 
such  that  measuring  the  propagation  delays  along  them  the  delay  along  any  other  path  can  be 
calculated,  does  not  exist.  Hence  some  additional  modifications  of  Do  are  required- 
Specifically,  the  half  adders  of  D0,  that  is,  the  adders  of  the  first  row,  are  substituted  by  full 
adders.  The  extra  input  of  the  leftmost  adder,  of  the  first  row,  is  driven  by  a  test  input  fi.  The 
extra  inputs  of  the  rest  adders  of  the  first  row  are  driven  alternately  by  the  t2  and  t3  test  inputs. 
During  normal  circuit  operation  all  three  test  inputs  ti,  t2  and  t3  are  driven  to  0. 


HI.  Path  selection  procedure 

We  will  denote  an  AND  gate  of  D0  as  AND  (x,  y)  when  its  inputs  are  Ax  and  By.  Since 
each  AND  gate  has  two  inputs,  there  are  two  sub-paths  starling  from  the  inputs  and  ending  to 
the  output  of  every  AND  gate  of  the  multiplier.  Each  such  sub-path  will  be  denoted  by  a 
triplet  of  the  form  (s,  x,  y)  or  (t,  x,  y).  The  first  element  denotes  the  start  of  the  sub-path,  that 
is  s  or  t  defines  the  sub-path  starting  from  the  input  Ax,  1  <  x  <  n  or  By,  1  <  y  <  m  respectively. 
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and  ending  at  the  output  of  AND  (x,  y).  The  two  latter  elements  of  the  triplet  indicate  the 
specific  AND  gate  of  die  design  that  we  refer  to. 

We  consider  that  each  full  adder  has  been  implemented  in  a  robustly  testable  way,  as  for 
example  in  [4].  Each  full  adder  has  three  inputs  Ii,  I2,  I3  and  two  outputs,  S  (sum)  and  C 
(carry),  as  shown  in  Figure  1.  Table  1  lists  the  possible  sub-paths  of  any  full  adder.  Each  full 
adder  of  D0  will  be  described  as  ADDER  (x,  y)  in  a  similar  to  the  AND  gates  notation 
introduced  earlier.  We  again  denote  the  sub-paths  along  full  adders  by  triplets  of  the  form  (u, 
x,  y),  where  u  e  {a,  b,  c,  d,  e,  f,  g,  h,  i,  j,  k,  1,  m,  n,  0,  p,  q,  r}  is  used  to  specify  one  of  the 
possible  sub-paths  along  the  full  adder  and  x,  y  indicate  the  specific  full  adder  of  the  design 
we  refer  to.  From  Table  1  we  can  easily  see  that  for  the  same  values  of  Ij,  fy,  I3  we  may  have 
two  sub-paths,  one  starting  from  an  input  and  ending  at  the  S  output  and  the  other  starting 
from  the  same  input  and  ending  to  the  C  output. 


Table  1.  Sub-paths  along  any  full  -  adder  of  Dp. 
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Using  the  above  notations,  a  physical  path  of  Do  can  be  described  as  an  ordered  set  of 
sub-paths.  Given  that  each  sub-path  is  denoted  as  a  triplet  we  conclude  that  a  path  can  be 
described  as  an  ordered  set  of  triplets. 

In  the  following,  we  will  give  a  veiy  small  subset  of  the  paths  of  Do,  that  if  we  measure 
the  delay  along  them  we  can  calculate  the  delays  along  every  path  of  Do  (the  proof  is  given  in 
[5]).  In  [5]  we  have  also  proven  that  all  selected  paths  are  SPP-HFRT. 

•  Pai  is  the  set  of  paths  {(s,  x,  y),  L]  where  all  sub-paths  of  L  have  the  form  (a,  x',  y'). 
Obviously  I  Pai  I  =  m  +  n  -  1,  where  |  X  |  denotes  the  cardinality  of  set  X. 

•  PA2  is  the  set  of  paths  { (t,  x,  y),  L).  |  PA2 1  =  m  +  n  -  1. 

•  Pa3  is  the  set  of  paths  { (s,  x,  y),  0},  where  0  are  all  possible  sub-paths  with  only  one  sub¬ 
path  of  the  form  (w,  x',  yO  for  w  =  b,  c,  d  and  all  other  sub-paths  of  the  form  (a,  x",  y"). 
IPajI  =3(m-l)(n-l). 

•  Pbi  is  the  set  of  paths  {(s,  x,  y),  (w,  x,  y-1),  L),  where  we  {i,j,  k,l).  I  Pbi  I  =  4(n-l)(m-l). 

•  Pb2  is  the  set  of  paths  {(t,  x,  y),  (i,  x,  y-1),  L}.  |PB2 1  =  (n-l)(m-l). 

•  Pci  is  the  set  of  paths  {N,  fm,  x,  y),  ( W2. ,  x,  y+1),  M],  W2  e  {e,  f,  g,  h},  and  N,  M  include 
only  s  or  a  type  sub-paths.  I  Pci  I  =  4(n-l)(m-l)+(n-l). 

•  Pc2  is  the  set  of  paths  {N,  (n,  x,  y),  (e,  x,  y+1),  M}.  I PC2 1  =  n(m-l). 

•  Pdi  is  the  set  of  paths  { (s,  x,  y),  (wi,  x,  y-1),  (e,  x,  y),  Af),  wie  (q,  r}.  |Pdi  1=  2  (n-1)  (m- 

1). 

•  Pei  is  the  set  of  paths  (N,  (m,  x,  y-1),  (p,  x,  y),  (e,  x ,  y+1),  M }.  |  PEi  I  =  2(n-l)(m-2), 

•  Pe2  is  the  set  of  paths  {N,  (n,  x,  y-1),  (0,  x,  y),  (e,  x,  y+1),  M).  |  PE2 1  =  2(n-l)(m-2). 
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IV.  Conclusions 

Path  delay  fault  testing  of  a  CSM  is  a  difficult  task  due  to  the  excessively  large  number 
of  its  physical  paths.  In  this  paper  by  introducing  minor  modifications  to  the  original 
multipliers  design  (Table  3  presents  the  hardware  overhead  of  the  proposed  easily  testable 
design)  we  present  a  method  for  selecting  a  very  small  subset  of  physical  paths  that  the 
propagation  delay  along  them  should  be  measured.  The  cardinality  of  this  subset  is  orders  of 
magnitude  smaller  than  the  number  of  all  physical  paths  of  the  original  design  as  listed  in 
Table  2  for  multipliers  with  their  last  stage  implemented  as  a  ripple  carry  adder.  The  delay 
along  any  other  path  of  the  multiplier  can  be  calculated  by  the  delays  along  the  selected  paths 
[5].  The  selected  paths  have  been  proven  to  be  SPP-HFRT  [5]. 


Table  2.  Reduction  in  number  of  paths  that  need  to  be  tested 


CSM  size 
nXm 

Total  number  of 
physical  paths 

Number  of  paths 
to  be  tested 

Reduction 

% 

8x8 

5.825x10s 

1032 

99.9998 

16x16 

3.189X1017 

4392 

*100 

32x32 

6.245x1 034 

18408 

*100 

64x64 

2.142X1069 

75624 

« 

8 

Table  3.  Hardware  overhead  of  the  proposed  Easily  Testable  Multiplier^ 


CSM 
size  nXm 


Mrc 


Number  of  Gates  Overhead  % 

Mruk  ETMrc  ETMcla  (ETMrc  .  Mrc)/Mrc  (ETMcla  -  Mc1.a)/Mcla 

17.36 


8X8 

16X16 

32X32 

64X64 


792 

3496 

14664 

60040 


818 

3534 

14726 

60150 


934 

3798 

15286 

61302 


960 

3836 

15348 

61412 


17.93 

8.64 

4.24 

2.10 


8.54 

4.22 

2.1 


Mrc  :  original  CSM  with  the  last  stage  implemented  as  a  ripple  carry  adder. 

Mcla  :  original  CSM  with  the  last  stage  implemented  as  a  group  carry  look  ahead  adder. 
ETMrc  :  proposed  CSM  with  the  last  stage  implemented  as  a  ripple  carry  adder. 

ETMcla  :  proposed  CSM  with  the  last  stage  implemented  as  a  group  carry  look  ahead  adder. 
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Abstract.  Current  paper  presents  a  novel  idea  of  test  set  minimization, 
which  is  based  on  representing  fault  matrices  of  test  sets  by  bipartite  graphs.  We 
show  that  bipartite  graphs  provide  for  a  remarkable  speed-up  of  the  compaction  of 
test  sequences  in  comparison  with  matrix  representations.  Results  show  that,  in  a 
number  of  realistic  cases,  the  proposed  algorithm  is  capable  of  proving  that  global 
minimum  of  the  static  test  set  compaction  problem  has  been  reached. 


I.  Introduction 

Minimization  of  the  number  of  patterns  in  a  test  set  is  an  essential  problem  for  the  chip 
manufacturer,  who  faces  the  test  of  millions  of  units  per  annum  [1].  The  time  required  to  test  a 
chip  by  the  ATE  is  directly  proportional  with  the  length  of  the  test  sequence.  Therefore,  the 
number  of  patterns  in  a  test  set  is  an  important  parameter  when  speaking  of  test  pattern 
generation.  There  exist  two  types  of  test  compaction  techniques:  static  and  dynamic.  With 
static  compaction  a  test  set  is  generated  and  subsequently  attempts  are  made  to  minimize  it 
without  reducing  its  fault  coverage.  Dynamic  test  set  minimization,  on  the  other  hand,  is 
performed  at  the  time  when  tests  are  being  generated.  This  often  implies  modification  of  the 
test  generation  algorithm.  Current  paper  considers  the  static  technique  only,  where  test  sets  are 
created  by  any  test  pattern  generator  (ATPG)  and  compacted  by  a  standalone  compaction  tool. 

The  problem  of  static  test  set  minimization  is  NP-complete.  In  order  to  try  all  possible 
solutions,  n!  sequences  of  the  test  patterns  have  to  be  considered.  However,  this  is  impractical 
even  for  a  moderate  value  of  n.  Thus,  more  sophisticated  methods  have  to  be  applied  to  solve 
the  problem.  Some  of  the  advanced  ATPG  algorithms  include  test  set  compaction.  For 
example,  in  [2]  and  [3]  test  sets  are  compacted  by  fault  simulating  the  patterns  in  a  reverse 
order.  However,  this  method  is  too  simple  to  guarantee  satisfactory  results.  Many  authors  [4, 
5,  6]  have  successfully  applied  genetic  algorithms  for  test  set  compaction.  Current  paper 
presents  a  deterministic  approach,  which  has  the  following  advantages  over  the  above 
mentioned  methods:  1)  In  some  cases,  it  is  capable  of  proving  that  global  minimum  is 
reached;  2)  It  is  considerably  faster  than  genetic  methods. 

II.  Bipartite  Graph  Representations 

An  example  of  abstract  fault  matrix  is  given  in  Fig.  la.  Each  row  in  the  matrix 
corresponds  to  a  test  pattern  and  each  column  represents  a  fault.  A  7 '  in  a  row  denotes  that 
the  pattern  covers  corresponding  fault,  while  ‘ 0 '  denotes  that  it  does  not.  Our  aim  is  to 
minimize  the  number  of  rows  in  the  matrix  so  that  there  would  be  as  many  columns  with  at 
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least  one  7 '  as  in  the  initial  matrix.  It  is  easy  to  see  that  the  minimal  solution  for  current 
example  is  to  select  patterns  p2  and  p4  while  discarding  pi  and  p3. 


pi  p2  p3  p4 


Figure  L  Fault  Matrix  audits  Bipartite  Graph  Representation 


c) 


The  matrix  can  be  represented  by  a  bipartite  graph  G-(V,  E)  with  two  disjoint  subsets 
of  vertexes  P  and  F  (PnF=0,  PuF=V).  Each  vertex  pj  (pi  6  P)  corresponds  to  a  vector  and 
each  vertex  fj  (fj  g  F)  corresponds  to  a  fault.  There  exists  a  connection  between  p,  (pi  €  P)  and 
(fj  g  F)  if  vector  pj  covers  fault  fi.  Graph  derived  from  the  fault  matrix  in  Figure  la  is 
presented  in  Figure  lb.  Bipartite  graphs  provide  for  a  more  compact  data  structure  for  the 
problem.  In  the  classical  approach  we  have  to  traverse  nxm  matrix,  while  in  the  graphs  n  +  m 
'Vertices  with  all  the  edges  have  to  be  considered.  In  other  words,  the  room  to  be  traversed  in 
bipartite  graphs  is,  in  the  worst  case,  equal  to  the  number  of  ones  in  the  matrix  multiplied  by 
two.  (Each  edge  has  two  ending  points).  Section  4  presents  the  experimental  results  that  show 
the  speed-up  provided  by  bipartite  graph  representations  in  comparison  with  fault  matrices. 


III.  Compaction  Algorithm 

There  have  been  numerous  arguments  whether  to  implement  genetic  or  greedy 
approaches  in  order  to  solve  the  test  set  compaction  problem.  Supporters  of  genetic 
approaches  claim  that  greedy  algorithms  are  often  stuck  to  local  minima  [8].  In  order  to 
investigate  the  possibilities  of  the  greedy  approach  we  propose  a  simple  algorithm,  which  uses 
a  number  of  previously  undetected  faults  detected  by  a  vector  as  heuristics.  Let  us  consider  the 
simple  method  described  in  procedure  1. 

procedure  1 

1.  Initial  test  set  T  -  0. 

2.  Select  a  vertex  p  (p  g  P)  which  has  the  highest  degree 

3.  Remove  p  and  all  the  vertices  fi  (fi  g  F),  which  have  an  edge  between  them  and  p. 

4.  Add  the  pattern  corresponding  to  p  to  the  test  set  T. 

5.  If¥*Q  go  to  point  2,  othen\nse  T  forms  the  minimized  test  set. 

On  the  fault  matrix  this  procedure  would  be  equivalent  to  recursively  choosing  these 
patterns  that  detect  more  new  faults  in  respect  to  the  faults  already  covered  by  the  previously 
chosen  patterns.  In  practice  procedure  I  works  fairly  well.  However,  in  many  cases  it  does  not 
guarantee  minimal  results.  For  example,  if  we  apply  it  to  the  graph  presented  in  Figure  lb,  we 
will  see  that  vertex  pi  could  be  chosen  first,  since  it  is  one  of  the  patterns  with  the  highest 
degree.  This  choice  would  not  lead  to  the  minimal  test  set,  which  is  p2,  p4.In  order  to 
minimize  such  kind  of  incorrect  decisions  we  introduce  the  following  preprocessing  step  to 
our  algorithm. 
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preprocess 

1 .  If  a  vertex  f  (f  e  F)  exists,  whose  degree  is  1  then  go  to  point  2,  else  end  preprocessing. 

2.  Remove  pk  (pk  e  P)  that  is  connected  to  f  and  fj  (fj  e  F),  which  are  connected  to  pk. 

3.  Include  pattern  corresponding  to  vertex  pk  to  the  test  set  T  and  go  to  point  l . 

The  presented  preprocessing  step  is  equivalent  to  choosing  unique  patterns.  We  say  that  a 
pattern  is  unique  when  it  covers  some  faults  that  are  not  covered  by  any  other  pattern.  It  is 
obvious  that  these  patterns  have  to  be  included  to  the  minimal  test  set.  On  the  right  in  Fig.  2  is 
the  graph  after  application  of  the  preprocessing  step.  As  our  experiments  show,  majority  of 
patterns  in  the  generated  test  sets  are  unique.  Therefore,  by  choosing  these  patterns  as  a  first  step, 
the  search  space  can  be  significantly  reduced.  In  order  to  further  compact  the  test  sets  obtained  by 
the  above  described  algorithm,  we  tried  to  detect  any  dispensable  patterns  in  our  test  sets,  i.e. 
patterns  when  removed  would  not  lower  the  fault  coverage.  We  did  not  find  any  such  patterns  in 
our  test  sets.  This  means  that  all  the  test  sets  that  we  generated  are  reducts  [9].  However,  it  does 
not  imply  that  they  are  the  global  minima  for  the  static  compaction  problem.  The  next  section 
explains  the  conditions  that  allow  to  determine  whether  a  test  set  is  a  global  minimum. 

In  the  following  we  present  the  criterion  for  global  minimum  detection  by  the  above 
described  algorithm.  The  criterion  is  more  general  and  can  be  applied  to  any  other  compaction 
'algorithm:  A  static  test  set  compaction  algorithm  has  reached  global  minimum  if  the  number  of 
patterns  in  the  minimized  test  set  is  less  than  three  patterns  larger  than  the  number  of  unique 
patterns.  If  the  minimized  test  set  contains  three  patterns  more  than  the  set  of  unique  patterns,  we 
can  not  any  more  prove  the  ‘globalness ’  of  the  solution,  although  probability  is  very  high.  In 
these  cases,  more  sophisticated  techniques  have  to  be  applied.  Experimental  results  show  that 
many  of  the  minimized  test  sets  are  proved  to  be  global  minima  since  they  comply  with  the 
above-mentioned  criterion. 

IV.  Experimental  Results 

Experiments  were  carried  out  with  three  types  of  test  pattern  generators:  deterministic, 
genetic  and  random.  For  deterministic  test  pattern  generation,  the  PODEM  algorithm  [11]  was 
implemented.  All  of  the  test  generation  tools  were  taken  from  the  Turbo  Tester  package  [7]. 
Table  I  shows  the  test  set  compaction  times  for  ISC  AS  85  benchmarks  [10].  As  the  results 
show,  the  proposed  algorithm  performs  very  fast.  The  longest  compaction  time  is  required  for 
the  test  set  of  c7552  that  is  generated  by  the  random  ATPG.  Here,  time  spent  by  the  algorithm 
on  the  bipartite  graph  model  was  4  hundreths  of  second  while  it  was  3  tenths  of  second  for  the 
matrix  representation.  The  time  required  to  construct  the  graph  was  not  included.  All  the 
experiments  were  run  on  a  233  MHz  Pentium  II  PC  computer  under  Windows  95  operating 
system.  In  order  to  understand  the  experimental  data  in  Tables  II-IV  some  additional 
definitions  must  be  added.  A  pattern  that  does  not  detect  any  additional  faults  in  respect  to  the 
set  of  unique  vectors  is  called  a  noncontributing  pattern.  The  number  of  patterns  in  the  test 
set  that  are  neither  noncontributing  nor  unique  is  referred  to  as  the  search  space.  Search  space 
represents  the  size  of  the  problem  for  the  pattern  selection  algorithm. 

Experiments  show  that  selection  of  unique  vectors  as  a  preprocessing  step  con¬ 
siderably  reduces  the  search  space  for  the  compaction  algorithm.  For  the  deterministic  test  sets 
51-83  %  of  the  patterns  were  unique  and  the  search  space  included  only  up  to  38  %  from  the 
initial  test  set,  while  the  ratio  of  unique  patterns  was  71-100  %  for  genetic  and  57-100  %  for 
random  tests.  Search  space  was  up  to  21  %  for  genetic  and  26  %  for  random  generator  sets. 
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circuit 

tests 

unique 

noncontrib 

search 

result 

c432 

89 

52  (58%) 

31 

6 

55 

c499 

140 

94(67%) 

31 

15 

100 

c880 

70 

50(71%) 

15 

5 

52* 

cl  908 

144 

120(83%) 

20 

4 

122* 

c2670 

160 

111  (69%) 

28 

21 

119 

C3540 

201 

137  (68%) 

37 

27 

145 

C5315 

178 

91  (51%) 

20 

67 

108 

C6288 

41 

30(73%) 

3 

8 

33 

I 

276 

190  (69%) 

62 

24 

198 

circuit 

PODEM,  s 

Genetic,  s 

Random,  s  | 

EB 

vium 

Fima 

InEliiW 

nma 

c432 

mttim 

0.02 

uMjM 

0.02 

■jXjljl 

c499 

0.00 

0.02 

0.00 

0.02 

c880 

0.01 

0.03 

0.01 

0.03 

K«MB 

cl  908 

0.01 

0.03 

0.01 

0.03 

0.01 

C2670 

0.01 

0.05 

0.01 

0.04 

0.01 

C3540 

0.01 

0.06 

0.01 

0.05 

0.01 

C5315 

0.02 

0.18 

0.01 

0.06 

0.01 

1T1U 

C6288 

0.01 

0.05 

0.01 

0.04 

0.01 

Mils 

C7552 

0.02 

0.16 

0.01 

0.11 

0.04 

DBSl 

Table  I.  Compaction  Times  for  the  Test  Sets  Table  II.  Results  for  Deterministic  A  TPG 


circuit 

tests 

unique 

non-contrib 

search 

result 

c432 

51 

45  (88  %) 

3 

3 

46* 

c499 

86 

86  (100%) 

0 

0 

86* 

c880 

63 

43  (68  %) 

15 

5 

45* 

cl  908 

132 

109  (83  %) 

16 

7 

112 

c2670 

107** 

72  (67%) 

27 

8 

75 

C3540 

167 

137  (82  %) 

16 

14 

143 

C5315 

132 

100  (76  %) 

19 

13 

106 

C6288 

24 

21  (88  %) 

1 

2 

22* 

C7552 

249** 

143  (57  %) 

41 

65 

164 

ism 

■nss 

non-contrib 

m 

EE9 

51 

44  (86  %) 

1 

6 

46* 

c499 

86 

84  (98  %) 

0 

2 

85* 

c880 

46 

35  (76  %) 

5 

6 

38 

cl  908 

121 

109  (90  %) 

10 

2 

110* 

C2670 

112 

80  (71  %) 

9 

23 

87 

C3540 

155 

133  (86  %) 

11 

11 

138 

c5315 

115 

93  (81  %) 

13 

99 

c6288 

21 

21  (100  %) 

0 

21* 

C7552 

192** 

149  (78  %) 

16 

156 

Table  III.  Results  for  Genetic  ATPG  Tests  Table  IV.  Results  for  Random  ATPG  Tests 

*  -  global  minimum  was  detected  and  proved 

**  .  initial  test  set  did  not  cover  all  the  testable  faults 


The  proposed  algorithm  achieved  very  effective  compaction  results  on  ISC  AS  85 
benchmarks,  where  in  few  hundredths  of  seconds  for  10  of  the  27  test  sets  global  minima  were 
detected  and  proved.  For  the  rest  of  the  minimized  test  sets  we  can  not  claim  with  certainty 
that  they  are  the  global  minima  for  the  static  compaction  problem.  However,  by  a  simple 
postprocessing  step  we  detected  that  the  obtained  test  sets  did  not  contain  any  indispensable 
patterns  [9].  Thus,  all  of  the  test  sets  are  reducts  [9]. 
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Abstract1  This  paper  presents  a  hierarchical  test  generation  technique 
for  embedded  systems  containing  hardware  and  software.  The  technique  is 
applied  to  the  system-level  specification  of  such  systems.  Different  from  the 
traditional  approaches ,  hardware  and  software  parts  of  an  embedded 
system  are  handled  in  a  uniform  way.  We  will  in  particular  show  how  the 
proposed  technique  can  be  applied  at  high  levels  of  abstraction  and  how  the 
software  domain  of  the  specification  can  also  be  successfully  covered. 

1.  Introduction 

The  development  of  hardware/software  codesign  techniques  has  made  it  possible  to 
design  hardware  and  software  of  an  embedded  system  at  the  high  levels  of  abstraction  in  a 
uniform  way.  However  the  testing  of  the  hardware  and  software  parts  of  the  system  are  still 
considered  as  totally  unrelated  problems  and  solved  with  very  different  methods. 

'  In  the  early  phases  of  the  design  cycle,  system  synthesis  is  performed  starting  from  an 
implementation  independent  specification.  Reasoning  about  testability  at  this  level  can  be 
facilitated  by  an  uniform  test  generation  technique,  which  is  both  applicable  to  the  hardware 
and  software  domains.  In  [1],  [4-5],  [13]  test  generation  and  testability  analysis  for  this  par¬ 
ticular  problem  has  been  studied,  but  not  many  efficient  techniques  have  been  developed  yet. 

In  our  approach,  testability  evaluation  and  test  generation  at  the  system  level  are  based  on 
hierarchical  test  generation  (HTG)  [9].  We  apply  HTG,  using  a  decision  diagram  (DD)  [12] 
based  representation,  and  show  that  it  can  be  used  for  both  the  hardware  and  software 
domains  as  well  as  for  different  levels  of  abstraction. 


2.  Hierarchical  Test  Generation  for  Hardware/Software  Systems 

Test  generation  has  been  proven  to  be  an  NP-complete  problem  [6].  There  has  been  a  lot 
of  research  devoted  to  solve  the  test  generation  problem  for  gate-level  circuits.  Working  at 
this  level  provides  very  high  quality  of  the  tests  but  is  computationally  very  expensive  in  the 


case  of  large  circuits  and  therefore  practically 
not  usable.  Several  approaches  have  been 
developed  to  handle  test  generation  for 
relatively  large  combinational  circuits  in  a 
reasonable  time.  Test  generation  for  large 
sequential  circuits  remains,  however,  an 
unsolved  problem,  despite  rapid  increase  of 
computational  power.  Hierarchical  test 
generation  has  been  proposed  as  one  possible 
solution  [2,  8,  10], 


To  give  the  designer  an  opportunity  to  Figurc  Tes(  gcneration  alld  tcstability  analys!s  in 
perfoim  design  foi  testability  already  in  the  a  hardware/softwarc  co-dcsign  environment 


1  This  work  was  partially  supported  by  the  Swedish  National  Board  for  Industrial  and  Technical  Development  (NUTEK). 


22 


early  design  stages,  testability  evaluation  should  be  applied  directly  to  the  system 
specification.  And  a  testability  metric  should  be  part  of  the  cost  function  considered  during 
system  level  synthesis,  and  in  particular  for  hard  ware/ software  partitioning. 

Figure  1  shows  how  testability  evaluation  and  test  generation  fit  into  such  a  system 
synthesis  concept.  As  discussed  in  section  1,  testability  evaluation  and  test  generation  are 
performed  on  a  high  level  implementation-independent  representation  and  they  provide 
results  to  be  interpreted  and  used  in  a  coherent  way  for  both  the  hardware  and  software 
partitions. 

3.  HTG  for  Specifications  to  be  Implemented  in  Software 

In  our  approach,  decision  diagrams  are  used  for  design  modeling  at  the  high  abstraction 
levels.  The  main  advantage  of  modelling  with  DDs  lies  in  the  fact  that  a  uniform  concept  can 
be  applied  on  different  abstraction  levels. 

An  extended  overview  of  DDs  is 
presented  in  [12]. 

Our  main  objectives  are  to  show  how 
DDs  can  be  used  for  test  generation  at  the 
behavioural  level  and  how  HTG  can  be 
used  for  testing  the  part  of  the  system 
which  finally  will  be  implemented  as 
software.  Hierarchical  test  generation 
technique  for  hardware  has  been  reported 
at  [8]. 

At  this  level,  for  every  internal 
variable  and  primary  output  of  the  design  a 
data-flow  DD  will  be  generated.  Terminal 
nodes  of  the  data-flow  DD  represent 
arithmetic  expressions.  Further,  an 
additional  DD  which  describes  the  control- 
flow  has  to  be  generated.  The  control- 
flow  DD  describes  the  succession  of 
statements  and  branch  activation 
conditions.  Figure  2  depicts  an  example 
of  DD,  describing  the  behavior  of  a 
simple  function.  For  example,  variable  A 
will  be  equal  to  /AT +2,  if  the  system  is  in  the  state  q-2  (Figure  2c).  If  this  state  is  to  be 
activated,  condition  IN1>0  should  be  true  (Figure  2b).  The  DDs  extracted  from  a  specification 
will  be  used  as  a  computational  model  in  HTG  for  symbolic  path  activation. 

3.1.  Test  Generation  Algorithm 

There  are  two  types  of  tests  which  we  consider  in  the  current  approach.  One  set  targets 
nonterminal  nodes  of  the  control-flow  DD  (conditions  for  branch  activation)  and  the  second 
set  aims  at  testing  operators,  depicted  in  terminal  nodes  of  the  data-flow  DD. 

The  whole  test  generation  task  is  performed  in  the  following  way.  Tests  are  generated 
sequentially  for  each  nonterminal  node  of  the  control-flow  DD.  Symbolic  path  activation  is 
performed  and  functional  constraints  are  extracted.  Solving  the  constraints  gives  us  the  path 
activation  conditions  to  reach  a  particular  segment  of  the  specification.  In  order  to  test  the 
operations,  presented  in  the  terminal  nodes  of  the  data-flow  DD,  different  approaches  can  be 


if  (INI  <  0)  then 


A  : 

=  INI  *  2; 

. q=i 

else 

A  : 

=  INI  +  2; 

- q-2 

endif 

; 

B 

:=  INl*29 ; 

. . q=3 

A 

:=  B*2 ; 

B 

:=  A+4  3 ;  - 

. q  4 

a)  Specification  (comments  start  with 


b )  the  Control-flow  DD  c)  Data-flow  DD 

(q  denotes  the  state  variable 
and  q'  is  the  previous  state ) 

Figure  2.  A  DD  example 
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used.  In  this  paper,  we  use  mutation  testing  [4]  for  test  generation  for  the  operations  at  the 
terminal  nodes.  For  path  activation,  a  slightly  modified  version  of  the  algorithm  described  in 
[8]  is  used. 

3.2.  Conformity  Test 

For  the  nonterminal  nodes  of  the  control-flow  DD,  conformity  tests  will  be  applied.  The 
conformity  tests  target  errors  in  branch  activation.  In  order  to  test  nonterminal  node  INI 
(Figure  3),  one  of  the  output  branches  of  this  node  should  be  activated.  Activation  of  the 
output  branch  means  activation  of  a  certain  set  of  program  statements.  In  our  example, 
activation  of  the  branch  IN1<0  will  activate  the  branches  in  the  data-flow  DD  where  q=l 
(A:=X).  For  observability  the  values  of  the  variables  calculated  in  all  the  other  branches  of 
INI  have  to  be  distinguished  from  the  value  of  the  variables  calculated  by  the  activated 
branch.  In  our  example,  node  INI  is  tested,  in  the  case  of  IN1<0,  if  X*Y.  The  path  from  the 
root  node  of  the  control-flow  DD  to  the 
node  INI  has  to  be  activated  to  ensure 
the  execution  of  this  particular 
specification  segment.  The  conditions, 
generated  here,  should  be  justified  to 
the  primary  inputs  of  the  module.  This 
process  will  be  repeated  for  each  output 
branch  of  the  node.  In  the  general  case 
there  will  be  n(n-l)  tests,  for  every 
node,  where  n  is  the  number  of  output 
branches. 

3.3.  Testing  Arithmetic  Operators 

As  mentioned  earlier,  test  vectors  for  the  terminal  nodes  can  be  generated  based  on 
different  approaches,  and  our  HTG  technique  does  not  impose  a  specific  one.  Currently  we 
use  a  mutation  based  fault  model  [4]  for  testing  terminal  nodes  of  the  data-flow  DD.  We  are 
using  a  library  of  operator  mutations,  which  describes  for  each  operator  a  set  of  corresponding 
mutants  and  conditions,  which  can  distinguish  between  the  mutant  and  the  original  operator. 
Suppose  we  have  the  expression:  x :  =  ( a+b )  -  c .  To  rule  out  the  fault  that  the  first  is 
changed  to  b  must  not  be  0  (because  a+0«=a-0).  Additionally,  to  rule  out  the  fault  that 
instead  of  “+”  there  is  we  have  to  assure  that  a+b;*a*b.  For  more  details  about  operator 
mutants,  we  refer  the  reader  to  [7]. 

4.  Experimental  Results 

Experiments  were  conducted  in  the  environment  consisting  of  our  hierarchical  test 
generator,  the  library  of  mutants  for  different  arithmetic  operators,  and  the  Generic  Coverage 
Tool  (GCT)  [11]  which  measures  the  quality  of  the  generated  test  cases.  Conversion  between 
different  representations  (VHDL,  C,  Fortran  and  DD)  is  performed  by  corresponding 
translation  tools.  In  order  to  evaluate  our  results  we  compare  them  with  those  produced  by  the 
software  test  generation  tool  Mothra  [3].  Experiments  were  carried  out  on  three  embedded 
software  examples.  Table  1  presents  the  experimental  results  of  our  approach  in  comparison 
with  the  results  achieved  by  Mothra.  The  Achieved  fault  coverage  reflects  synthetically 
several  different  coverage  criteria  (statement  coverage,  branch  coverage,  loop  coverage  etc.). 
As  observed,  the  mutation  based  testing  tool  Mothra  generates  a  much  larger  set  of  test 
vectors,  which,  at  the  same  time,  produces  a  weaker  coverage. 


Figure  3.  Conformity  test 
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test  cases 
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coverage 

Number  of 
generated 
test  cases 

Fault 

coverage 

Square 

38 

12 

813 

707 

5 

77.65% 

10 

94.12% 

Mult 

20 

6 

478 

449 

3 

84.00% 

6 

90.00% 

FFT 

31 

4 

1682 

1639 

4 

83.91% 

6 

86.21% 

Table  1.  Experimental  results 


5.  Conclusions 

This  paper  describes  a  novel  hierarchical  framework  for  test  generation  in  hardware/software 
systems.  Hardware  and  software  parts  of  an  embedded  system  can  be  handled  in  a  uniform  way. 
The  same  DD  representation  can  be  used  for  describing  systems  at  different  abstraction  levels, 
including  the  system  level.  Based  on  this  representation,  reasoning  about  testability  in  the  early 
design  phases  and  test  generation  for  both  the  hardware  and  the  software  domain  is  possible.  We 
have  shown  how  HTG  can  be  applied  at  the  high  levels  of  abstraction  and  how  the  software 
domain  of  the  specification  can  be  successfully  covered.  Experimental  results  have  shown  that  the 
quality  of  the  generated  test  vectors  is  even  better  than  those  produced  by  specific  software  test 
generation  tools  Our  future  research  is  to  integrate  the  testability  metrics  based  on  HTG  into  a 
hardware/software  codesign  environment. 
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Abstract1 

A  technique  to  schedule  tests  for  complex  digital  systems  is  proposed  where  the  test 
application  time  is  minimized  while  the  power  dissipation  is  kept  under  control.  The 
technique  is  based  on  estimations  of  the  test  application  time  and  the  power  consump¬ 
tion.  By  experiments  we  show  the  efficiency  of  our  approach. 

1.  Introduction 

Development  of  microelectronic  technology  has  lead  to  the  implementation  of  system  on  chip  (SOC),  where  a  com¬ 
plete  system,  consisting  of  several  ASICs,  microprocessors,  memories  and  intellectual  properties  (IP)  blocks,  is  placed 
on  a  single  chip.  Such  a  system  is  usually  made  testable  by  the  introduction  of  some  dcsign-for-leslability  (DFT) 
mechanism.  Several  DFT  techniques  such  as  lest  point  insertion,  scan  and  different  types  of  built-in  self-test  (BIST) 
have  been  devclopcd[l].  For  a  complex  SOC  design  it  is  not  unusual  to  combine  several  test  techniques  since  they  all 
have  their  respective  advantages  and  disadvantages.  Furthermore,  when  IP  blocks  are  used,  they  may  already  contain  a 
test  methodology  which  is  different  from  the  rest  of  the  design  and  it  has  to  be  incooperatcd  in  the  overall  lest  strategy 
of  the  whole  system. 

There  arc  many  similarities  in  testing  PCB  (printed  circuit  boards)  and  SOC.  The  major  difference  is  however  two¬ 
fold.  In  PCB,  testing  of  each  individual  component  can  often  be  carried  out  before  mounting  and  the  components  can 
be  accessed  for  test  via  probing.  Neither  of  these  is  possible  when  testing  SOC.  This  means  that  testing  the  completed 
system  becomes  even  more  crucial  and  difficult.  In  order  to  keep  test  application  time  at  a  minimum,  it  is  desirable  to 
apply  as  many  tests  as  passible  concurrently.  However,  the  power  consumption  must  be  kept  under  control.  Otherwise, 
the  chip  could  be  damaged.  Furthermore,  due  to  resource  conflicts,  it  is  usually  not  possible  to  apply  all  tests  concur¬ 
rently. 

Much  research  has  focused  on  the  scheduling  problem  in  digital  system  design  [6].  The  main  question  has  been  to 
find  a  schedule  that  fulfills  a  set  of  constraints  and  minimizes  a  given  cost  function.  In  high-level  synthesis  the  cost 
function  has  traditionally  been  an  area  performance  function.  Today,  new  approaches  have  been  proposed  where  is¬ 
sues  such  as  testability  and  power  consumption  arc  considered.  In  scheduling  for  low  power,  the  scheduling  is  per¬ 
formed  to  find  a  design  which  minimizes  the  power  consumption.  These  techniques  focus  on  the  normal  mode  and  not 
at  the  test  mode.  The  power  consumption  at  the  lest  mode  can  be  much  higher  Ilian  during  normal  operation  due  to  the 
high  switching  activity  [2].  Therefore,  it  is  very  important  to  consider  power  consumption  in  the  test  mode.  A  major 
problem  in  synthesis  for  low  power  is  that  the  switching  activity  depends  on  the  input  data.  Several  techniques  have 
been  proposed  to  estimate  the  input  data 

A  schedule  has  to  be  developed  to  determine  the  order  of  the  tests.  A  dedicated  lest  controller  for  a  BIST  based  sys¬ 
tem  is  proposed  by  Zorian  [5].  A  general  test  scheduling  algorithm  is  proposed  by  Chou  et  al.  [4]  and  another  test 
scheduling  approach  is  proposed  by  Stroele  et  al.  [3]. 

In  this  paper  we  assume  a  design  with  added  DFT  features.  We  propose  a  technique  to  estimate  the  lest  application 
time  and  the  power  consumption.  Our  approach  is  based  on  estimating  the  needed  test  vectors  for  the  design.  We  pro¬ 
pose  a  test  scheduling  technique  which  is  based  on  these  estimations.  Our  scheduling  technique  minimizes  the  test  ap¬ 
plication  time  while  the  power  dissipation  is  kept  under  control. 

The  rest  of  the  paper  is  organized  as  follows.  First  we  present  the  technique  to  determine  which  lest  will  be  used  to 
test  a  resource  in  Section  2.  In  Section  3  the  estimation  of  test  application  time  is  presented  and  the  technique  to  esti¬ 
mate  the  test  power  consumption  is  given  in  Section  4.  Our  scheduling  technique  is  presented  in  Section  5.  Experi¬ 
mental  results  are  in  Section  6  and  finally  we  conclude  the  paper  in  Section  7. 

2.  System  Model  and  Test  Strategy 

A  digital  system  can  be  seen  as  a  set  of  interconnected  blocks.  To  lest  it,  a  set  of  tests  is  applied  and  to  minimize  the 
test  application  time  all  tests  should  be  applied  as  concurrently  as  possible.  However,  it  might  not  be  possible  to  apply 


1.  This  work  has  partially  been  supported  by  the  Swedish  National  Board  for  Industrial  and  Technical  Development  (NUTEK). 
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Figure  I:  Tests,  resources,  and  constraints. 
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Figure  2:  A  Scan  example. 


observable  paths. 


the  test  concurrently  due  to  high  power  dissipation  which  might  damage  the  circuit  under  test.  Furthermore,  there 
might  exist  different  types  of  conflicts  among  tests  and  resources  that  restrict  the  ability  for  concurrent  testing. 

A  resource  graph  can  be  used  to  describe  the  relationship  between  the  tests  and  the  IP  blocks  [4].  An  example  of  re¬ 
source  graph  is  in  Figure  1(a)  where  an  edge  from  test  t{  to  resource  r,  indicates  that  r,  tests  rv  In  Figure  1(a)  test  /, 
and  test  t2  can  not  be  executed  at  the  same  time  since  both  tests  resource  r2.  In  practice,  we  note  that  an  edge,  connect¬ 
ing  t[  and  /  j,  in  a  resource  graph  could  actually  mean  one  or  both  of  the  following  situations: 

•  A  test  tests  a  resource  ry 

•  When  a  test  t-x  is  applied,  resource  rj  is  required  for  test  isolation  and/or  test  access. 

We  introduce  the  constrained  resource  graph  where  a  constraint  level  is  added  in  order  to  distinguish  between  the 
two  cases.  If  a  resource  is  needed  for  other  reasons  but  testing  it  is  placed  on  the  constraint  level.  For  instance  if  re¬ 
source  r2  in  Figure  1  (a)  is  needed  in  order  to  apply  test  r2,  we  place  it  on  the  constraint  level,  see  Figure  1  (b). 

Our  second  observation  is  that  a  resource  has  to  be  tested  only  once.  We  introduce  a  technique  to  determine  which 
test  should  be  used  to  test  a  resource.  For  instance  resource  r3  in  Figure  1(a)  can  be  tested  by  either  t2  or  t3.  In  Figure  3 
logic3  can  be  tested  using  the  primary  input  PIh  scan  chain  th  or  scan  chain  t2.  If  Pl\  is  used  for  the  test,  the  test  vec¬ 
tors  have  to  pass  all  the  logic  from  Pf  to  logic3.  Furthermore,  if  P/j  is  used  for  the  test  of  logic3  neither  scan  chain 
or  scan  chain  t2  can  be  used  concurrently.  On  the  other  hand,  if  scan  chain  t\  is  used  to  test  logic3  the  test  vectors  have 
to  pass  through  less  logic  and  Pf  and  t2  can  be  used  to  test  other  units  at  the  same  time.  We  use  the  Shortest  Control¬ 
lable  Path  and  Shortest  Observable  Path  to  determine  which  test  tx  should  be  used  to  test  a  resource  rj  [8].  Let  G{V,  E) 
be  a  directed  graph  where  a  vertex,  veV,  corresponds  to  a  functional  unit  (operation).  A  start  vertex  is  a  vertex  which 
gets  its  value  directly  from  a  primary  input  or  a  test  access  port.  An  end  vertex  is  a  vertex  connected  to  a  primary  out¬ 
put  or  a  test  access  port.  A  path,  Pit  is  a  sequence  of  edges  {(v0,  vj),  (vj,  v2), (vn.lt  vn)>  between  v0  and  vn.  The 
Shortest  Controllable  Path ,  SCP(op),  for  an  operation  op  is  the  shortest  path  from  a  start  vertex  to  op,  and  the  Shortest 
Observable  Path ,  SOP{op)  is  the  shortest  path  from  an  operation  op  to  an  end  vertex. 

For  instance,  in  Figure  3,  SCP(logic3 )  -  {(q,  R3),  {R3,  Logic3)}  and  SOP(logic3 )  =  {(Logic3  Rf),  (R&  P02)}.  The 
first  element  on  the  SCP  is  where  the  test  vectors  are  applied.  The  constraints  in  the  constrained  resource  graph  are  the 
vertices  on  the  SCP  and  SOP.  According  to  our  approach,  logic3  is  tested  via  q  and  R3,  R6  and  P02  are  in  the  set  of 
constraints. 


3.  Estimation  of  Test  Application  Time  for  the  Scan  Technique 

An  operator  is  tested  by  applying  a  set  of  test  vectors  on  the  inputs  and  observe  the  response  on  the  outputs  of  the  op¬ 
erator.  The  test  application  time  depends  mainly  on  the  number  of  test  vectors  and  the  clock  frequency.  The  total  test 
application  time  T(t-)  for  a  test  q  is  given  by: 

nt,-)-  W  +  W  +  W  (l) 

where  Tc  is  the  time  to  set  up  the  test  {control),  Ta  is  the  time  to  apply  the  test  and  T0  is  the  time  needed  to  observe  the 
test  response.  For  the  scan  technique  we  use  the  following  definition: 
f{.  clock  frequency  at  test  mode,fn:  clock  frequency  at  normal  mode ,  n:  number  of  clock  cycles  in  normal  mode. 
tv(op)\  nuniber  of  test  vectors  needed  to  test  operation  op. 
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lin{tv  op):  length  of  scan  chain  tx  {number  of  flip-flops)  from  the  input  to  operation  op. 
loul{t\ j,  op):  length  of  scan  chain  t{  {number  of  flip-flops)  from  operation  op  to  the  output. 

Op{ti ):  the  set  of  operators  tested  by  test  tv 

max{tv  ty{op))\  the  maximal  number  of  test  vectors  needed  of  any  operator  op  tested  by  ti  where  ope  Op{t\). 

The  test  setup  (control)  time  Tc ,  the  application  time  Ta  and  the  observation  time  T0  for  a  test  t{  is  estimated  to: 

Tc(/,.)  -  x  max{tt,  tv{op))  Ta{t{)  -  j-  X  tv{op))  T0{t()  =  —  j-  ^  x  wd*(rt-,  tv{op)) 

j t  j  n  Jt 

where  opeOp{t{). 

An  example  of  a  scan-based  system  is  given  in  Figure  2.  Assume  that >= 8,  the  bit  width  of  all  registers  to  be  16  bits, 
n-1  and  /n=7|=10  MHz,  and  logic  a  requires  100  test  vectors,  logic  b  50  and  logic  c  10.  Using  the  above  formulae  the 
estimated  test  application  time  is:  100* (32+32+ l)/106=-0.65  ms. 

In  the  above  formulae  we  assume  that  we  know  in  which  position  of  the  scan  chain  we  have  the  operator  to  be  tested 
by  the  vectors.  If  that  is  not  known  we  have  to  scan  in  the  whole  vector  for  the  scan  chain  before  a  test  can  be  per¬ 
formed1.  In  this  case,  instead  of  using  lin{t{)  and  lou,{t{)  we  use:  l{t{):  length  of  scan  chain  tt  {number  of  flip-flops). 

The  total  estimated  test  application  time  becomes: 

T{t{)  -  (y  +  -yj  x  max{tv{op)) 

where  opeOp{t{). 

Taking  the  example  in  Figure  2  with  the  assumptions  above,  we  estimate  test  application  time  to:  1.29  ms. 

4.  Estimation  of  Test  Power  Consumption 

The  power  dissipation  in  a  CMOS  circuit  consists  of  a  static  and  a  dynamic  part.  The  static  power  dissipation  due  to 
leakage  current  or  other  current  drawn  continuously  from  the  power  supply,  and  the  dynamic  power  dissipation  is  due 
to  switching  transient  current  and  charging  and  discharging  of  load  capacitances  [7]. 

The  static  power  dissipation  and  the  dissipation  due  to  switching  transient  current  are  negligible  compared  to  the 
dissipation  due  to  load  and  unload  of  capacitances.  The  power  dissipation  due  to  charging  and  discharging  capacitanc¬ 
es  is  given  by  [7]: 

jXV2xCx/xo  (2) 

where  V  is  the  voltage,  C  is  the  capacitance, /is  the  frequency  and  a  is  the  switching  activity. 

All  parts  but  the  switching  activity  in  the  formula  2  are  given  or  can  be  extracted  from  a  design  library.  The  switch¬ 
ing  activity  depends  on  the  input  data  and  there  exists  mainly  two  approaches  to  estimate  it,  based  on  simulation  or 
probability.  During  testing  the  input  to  the  design  is  known.  It  is  the  test  vectors.  We  can  for  each  component  in  a  de¬ 
sign  use  the  test  vectors  generated  by  an  ATPG  tool  as  the  input  to  a  power  simulation  tool  and  then  build  a  library 
with  test  power  consumption  for  each  unit  in  the  design. 

5.  Test  Scheduling  Algorithm 

In  this  section  we  propose  a  test  scheduling  algorithm  which  minimize  the  test  application  time  while  keeping  power 
dissipation  under  control.  It  assumes  that  the  test  application  time  and  the  test  power  dissipation  have  been  estimated 
for  all  units. 

If  two  blocks  are  scheduled  for  test  starting  at  the  same  time  it  is  likely  that  one  of  the  tests  is  completed  before  the 
other  because  the  tests  are  of  unequal  length.  This  releases  resources  and  a  new  test  can  be  scheduled.  However,  if  all 
tests  in  a  test  session  are  fully  executed  before  tests  from  the  next  session  is  scheduled,  the  complexity  of  the  test  con¬ 
troller  is  minimized  [4].  We  present  a  test  scheduling  algorithm  in  Figure  4.  The  variable  time  in  the  algorithm  refers  to 
test  session  number. 

At  steps  1  and  2  in  the  algorithm  we  sort  the  tests  according  to  estimated  power  consumption,  see  Figure  4.  At  step 
4  we  have  a  loop  which  terminates  when  all  tests  are  scheduled  and  at  step  7  we  have  an  embedded  loop  where  wc  try 
to  schedule  as  many  tests  as  possible  at  a  specific  test  session.  At  step  8,  we  check  if  we  can  schedule  the  current  test  at 
this  test  session.  We  make  sure  that  the  power  consumption  is  kept  under  the  limit  and  we  check  the  constrained  re¬ 
source  graph  in  order  to  avoid  conflicts  between  tests  at  the  present  test  session. 

In  order  to  handle  unequal  test  length  the  following  step  is  modified: 

15:  time  “  time  +  min{scheduled  units); 

At  step  15  we  set  time  for  next  schedule  to  be  the  time  when  the  shortest  scheduled  test  is  completed. 


1.  It  is  obvious  that  scan  in  and  scan  out  can  be  performed  simultaneously.  However,  here  we  are  interested  in  the  relative  test 
application  time  for  different  blocks  in  the  design.  Therefore,  we  use  an  approach  with  low  computation  complexity. 
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(1)  Sort  the  tests  according  to  (estimated) 
power  consumption; 

(2)  Place  the  tests  in  a  list  where  the  test 
consuming  most  power  first; 

(3)  time=0; 

(4)  while  not  empty  list 

(5)  pwr ( time)=0; 

(6)  cur=first_test; 

(7)  while  not  at  end  of  list  do  begin 

(8)  if  pwr (time) +pwr( cur) <Max_pwr 
and  no  constraint ( cur ,  scheduled) 
then 

(9)  schedule (cur,  time) 

(10)  r emove_f  rom_l is t ( cur ) ; 

(11)  pwr (time )=pwr( time) +pwr( cur) ; 

(12)  end; 

(13)  cur=next_test; 

(14)  end; 

(15)  time=time+l; 

(16)  end ; 

Figure  4:  Test  scheduling  algorithm. 


Block 

Size 

PB/active 

(mW) 

PB/idle 

(mW) 

Test 

Length 

RL1 

mmia 

295 

- 

134 

RL2 

1 6000  gates 

352 

- 

160 

RF 

64*17  bits 

95 

19 

10 

RAMI 

768*9  bits 

282 

20 

69 

RAM2 

768*8  bits 

241 

17 

61 

RAM3 

768*5  bits 

213 

11 

38 

RAM4 

768*3  bits 

96 

7 

23 

ROM1 

1024*10  bits 

279 

23 

102 

1024*  10  bits 

279 

23 

jprnB 

Table  1:  Power  dissipation  and  estimated  test  length  for  ASIC  Z. 
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RF 

4 

ROM  1 .  ROM2 

cz 

- 

U- 

- 
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Table  2:  A  comparison  of  different  test  scheduling  approaches. 


6.  Experimental  Results 

We  have  used  the  ASIC  Z  [5]  example  to  compare  the  scheduling  algorithm  presented  in  this  paper  with  other  ap¬ 
proaches.  The  example  consists  of  9  blocks,  see  Table  1 ,  where  the  estimation  of  the  test  length  is  made  by  Chou  et  ai 
based  on  the  size  of  the  blocks  [4].  We  use  the  same  assumptions  as  Chou  et  al.>  which  is  maximal  power  dissipation 
limited  at  900  mW,  all  tests  can  be  applied  concurrently,  the  power  consumption  for  idle  blocks  are  excluded,  and  no 
tests  can  be  started  until  all  tests  in  the  previous  session  is  completed.  In  Table  2  we  compare  our  test  scheduling  tech¬ 
nique  with  Zorian’s  solution  [5]  and  the  approach  proposed  by  Chou  etal.  [4].  Using  our  approach  the  total  test  length 
(application  time)  is  300  while  the  approach  proposed  by  Zorian’s  has  392  and  the  Chou  et  alls  approach  has  331. 

7.  Conclusions 

The  high  complexity  in  digital  design  has  increased  the  need  for  design  for  testability  techniques  and  the  need  to  com¬ 
bine  several  DFT  techniques.  In  order  to  minimize  the  test  application  time  while  keeping  the  test  power  consumption 
under  control,  efficient  test  scheduling  is  required.  In  this  paper,  we  have  discussed  how  to  estimate  the  test  application 
time  and  the  test  power  consumption.  We  propose  a  test  scheduling  technique  and  experimental  results  show  the  effi¬ 
ciency  of  our  approach. 
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Abstract.  This  paper  deals  with  test  pattern  generation  based  on  a 
deterministic  algorithm  and  a  fault  simulation  technique  for  combined  IDdq 
and  voltage  testing.  A  deterministic  test  set  is  generated  covering  stuck-at, 
stuck-on,  stuck-off  faults  and  some  inside  gate  shorts.  The  test  set  consists 
of  combined  current-voltage  and  pure  voltage  test  patterns.  The  TPG  system 
is  running  in  two  phases:  the  first  one  is  test  generation  for  covering  the 
optimal  test  patterns  for  every  primitive  cell  in  a  tested  circuit  and  the 
second  one  is  real  defects  detection  and  localisation  over  one  selected  defect 
library.  Experiments  using  ISCAS'85  benchmark  circuits  have  been  done 
for  the  first  phase  of  the  ATPG  system. 


I.  Introduction 

Many  defects  causing  bridges,  breaks,  and  transistor  stuck-on  faults  in  CMOS 
circuits  are  not  detected  by  a  test  set  generated  using  the  traditional  single  stuck-at  fault 
model.  Each  physical  defect  should  be  covered  by  the  test  method  that  leads  to  the  lowest 
overall  testing  costs,  taking  into  account  e.g.  the  complexity  of  the  test  generation,  and  the 
test  application  time.  The  length  of  the  test  sequence  and  the  fault  coverage  of  the  test  set 
which  can  be  achieved  contribute  to  the  quality  of  the  tested  circuit.  Certain  types  of  CMOS 
defects  are  detected  by  current  test  patterns  or  by  voltage  test  patterns  only,  and  others  by 
both  [1].  Therefore  the  idea  to  use  a  combined  test  set  for  IDdq  and  voltage  testing  occurred. 
It  could  be  one  of  the  goals  towards  improving  the  test  quality  of  complex  CMOS  circuits 
testing.  Combination  of  functional  and  Iddq  test  patterns  can  create  an  effective  test  set.  Up 
to  now  various  Test  Pattern  Generation  (TPG)  techniques  for  Iddq  testing  have  been 
presented  [e.g.  2,3, 4, 5],  They  are  the  TPG  techniques  only  for  Iddq  testing  or  the  classical 
TPG  algorithms  for  stuck-at  faults  extended  by  current  test  patterns  generation  for  those 
defects  which  are  not  detected  by  the  voltage  test  patterns. 

In  the  paper,  an  in-house  experimental  ATPG  system  is  presented  based  on 
deterministic  and  random  test  generation  techniques  running  with  a  fault  simulator  where 
the  test  set  is  generated  currently  for  Iddq  and  voltage  testing.  Then  the  test  set  consists  of 
combined  current  and  voltage  test  patterns  and  of  pure  voltage  test  patterns.  The  system  can 
also  be  used  for  different  experiments  for  test  pattern  generation  for  combinational  circuits 
testing.  Some  parts  of  the  presented  TPG  system  (random  and  deterministic  TPG  based  on 
the  critical  path  tracing  technique)  were  published  in  [7-9].  A  general  description  of  the 
ATPG  system  is  introduced  in  the  next  part  together  with  some  results  using  ISCAS’85 
circuits.  Algorithms  for  real  defect  coverage  computation  and  real  defect  localisation  are 
based  on  an  analysis  of  a  list  of  undetected  fault  conditions  and  a  list  of  detected  fault 
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conditions  for  the  tested  circuit  over  the  selected  defect  library.  The  basic  idea  of  the 
algorithms  is  presented  in  the  second  part  of  the  paper.  The  algorithms  are  under 
implementation  now. 


II.  TPG  System  for  Iddq  and  Voltage  Testing 

The  ATPG  system  allows  to  generate  test  patterns  for  a  chosen  fault  list  from  the  fault 
conditions  cell  library  using  two  TPG  approaches:  random  and  deterministic  TPG  which  run 
together  with  the  fault  simulator.  Generally,  the  system  can  generate  the  optimum  test  set,  in 
function  of  the  constraints  given  for  a  cell  library,  which  contains  data  regarding  the  detection 
of  the  possible  faults/defects  in  a  cell.  The  evaluation  of  a  test  set  is  based  on  its  defect 
coverage.  The  following  types  of  faults  are  considered  in  this  system  at  the  structural  level 
using  six  different  defect  libraries  based  on  electrical  simulation  results. 

•  stuck-at,  stuck-open,  stuck-on 

•  transistor  shorts  (intra  -gate  shorts)  —  the  transistor  short  model  considers  shorts  between 
four  terminals  of  a  transistor:  source  (s),  drain  (d),  gate  (g),  bulk  (b).  There  are  six  such 
shorts  for  every  transistor. 

The  implicit  fault  model  is  used  in  the  TPG  system.  The  fault  model  is  defined  as  a  set 
of  fault  conditions  -  logical  values  for  every  primitive  logic  gate  which  has  to  be  applied  to  it 
to  cover  the  mentioned  faults.  Real  defect  coverage  is  computed  from  the  undetected  fault 
conditions  and  the  specified  defect  library.  The  TPG  system  runs  in  two  phases:  in  the  first 
one,  test  pattern  generation  for  covering  an  optimal  test  patterns  for  every  primitive  cell  is 
done  and  in  the  second  one,  real  defect  coverage  computation  and  defect  localisation  are 
applied  over  the  selected  defect  library.  The  first  phase  of  the  test  generation  process  is  split 
into  the  following  parts: 

•  The  pre-process  which  consists  of  compilation  of  the  circuit  description  into  an  internal 
form,  selection  of  a  design  style,  a  testing  mode  and  of  a  fault  conditions  list  from  the 
fault  conditions  cell  library.  Description  of  the  tested  circuit  uses  the  language  from  the 
ISCAS’85  circuits.  The  testability  measures  (used  in  the  deterministic  TPG)  are  computed 
before  test  pattern  generation. 

•  The  main  part  consists  of  the  random  TPG,  the  deterministic  TPG  (based  on  FAN 
strategies)  and  the  fault  simulator. 

•  The  post-process  consists  of  the  fault  condition  coverage  and  the  statistics  computation, 
the  creation  of  output  files  (a  test  set,  a  list  of  undetectable  fault  conditions  and/or  a  list  of 
detectable  fault  conditions). 

The  critical  path  tracing  technique  has  been  implemented  for  current  and  voltage  test 
patterns  generation  in  the  first  ATPG  prototype  [7,8].  The  new  deterministic  test  generation 
algorithm  based  on  FAN  strategies  has  been  implemented  in  the  presented  ATPG  system 
because  of  a  lot  of  test  patterns  for  voltage  testing  which  cannot  be  covered  during  the  critical 
path  tracing.  The  number  of  backtracks  can  be  specified  at  the  beginning  of  the  test  generation 
process  and  it  can  also  be  increased  during  the  test  generation  according  to  the  received  fault 
conditions  coverage. 
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The  following  table  introduces  results  using  both  random  and  deterministic 
approaches  for  test  pattern  generation  using  ISCAS’85  benchmark  circuits.  The  number  of 
non-active  test  patterns  was  5  000  test  vectors  during  random  test  generation. 


Random  +  fault  simulation 

Deterministic  +  fault  simulation 

% 

test 

C/V 

V 

C 

time 

% 

test 

C/V 

V 

C 

time 

c432 

95,82 

80 

42 

38 

0 

00:09 

95,82 

84 

35 

47 

2 

00:07 

c499 

99,43 

103 

76 

27 

0 

00:20 

99,44 

137 

83 

48 

6 

00:26 

c880 

100 

120 

56 

64 

0 

00:58 

100 

81 

14 

66 

1 

00:06 

c  1355 

99,74 

121 

103 

18 

0 

01:01 

99,75 

133 

82 

47 

4 

01:25 

cl  908 

99,68 

186 

143 

43 

0 

02:13 

99,78 

171 

115 

51 

5 

01:03 

C2670 

89,07 

123 

77 

46 

0 

02:23 

96,79 

158 

123 

28 

7 

04:41 

c3540 

96,85 

264 

184 

80 

0 

06:19 

97,08 

151 

41 

107 

3 

18:01 

c5315 

99,47 

238 

90 

148 

0 

06:13 

99,45 

229 

32 

196 

1 

07:50 

c6288 

99,40 

48 

44 

4 

0 

03:42 

99,21 

62 

16 

45 

1 

14:30 

C7552 

97,07 

314 

115 

199 

0 

08:26 

98,56 

216 

38 

177 

1 

19:21 

Table  1 :  ISCAS  experiments  (C/V  -  current  and  voltage  patterns) 


III.  An  Algorithm  for  Defect  Localisation 

The  second  phase  of  the  ATPG  system  consists  of  the  defect  diagnosis  with  real  defect 
coverage  computation  and  defect  localisation  algorithms.  Fault  coverage  received  from  the 
first  phase  of  the  TPG  process  illustrates  only  coverage  of  fault  conditions  for  all  gates  in  the 
tested  circuit  for  which  the  TPG  is  applied.  It  means  a  test  set  generated  by  the  system  is  only 
the  detection  test  set.  Therefore  two  other  algorithms  have  been  proposed  and  implemented 
for  defect  coverage  computation  and  defects  localisation.  The  algorithms  work  on  one 
selected  defects  library  from  six  types  and  one  user’s  defect  library.  The  user’s  library 
concerning  types  of  defects  and  test  patterns  for  every  gate  can  be  specified  by  the  user.  The 
second  phase  of  the  ATPG  system  requires  the  following  files: 

•  the  list  of  undetected  fault  conditions 

•  circuit  statistics 

•  the  test  set  or  the  list  of  covered  fault  conditions. 

These  files  are  outputs  of  the  first  phase  of  the  TPG  system.  The  first  input  file 
consists  of  some  statistical  data  about  the  tested  circuit: 

•  type  of  used  technology  (it  is  important  for  selection  of  convenient  defects  library  for 
every  type  of  gates) 

•  numbers  for  each  type  of  gates  (for  defects  number  calculation). 

•  number  of  lines  and  fan-out  points. 

The  second  input  file  is  the  list  of  undetected  (voltage  and/or  current)  fault  conditions 
related  to  the  gate  specified  by  its  type  and  name.  Defects  libraries  for  basic  cells  with  two 
inputs  -  NAND,  AND,  OR,  NOR,  EXOR  and  NOT,  BUFF  for  six  technologies  were 
received  from  electrical  simulation. 
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This  algorithm  for  defect  localisation  is  being  implemented  in  the  same  way  as  the 
defects  coverage  computation  based  on  the  huge  files  with  defects  dictionaries  using  fault 
conditions  and  fault  conditions  libraries  created  by  the  ATPG  system.  The  type  of  defects 
libraries  depends  on  the  used  technology  and  on  the  individual  basic  logical  gates.  The 
defects  covered  by  voltage  or  current  test  patterns  are  deleted  from  the  list  of  defects  for  the 
tested  circuit.  Both,  current  and  voltage  libraries  are  analysed  but  the  voltage  defects  library 
is  analysed  first,  because  testing  time  for  voltage  pattern  application  is  lower  than  for  current 
pattern  application.  Results  from  the  second  phase  using  the  ISCAS’85  benchmark  circuits 
can  be  presented  after  its  implementation  and  connection  to  the  ATPG  prototype. 


IV.  Conclusion 

In  this  paper  the  TPG  system  for  combined  current  and  voltage  test  generation 
together  with  defect  localisation  has  been  presented.  The  system  has  been  implemented  in  C 
language  under  WINDOWS  and  LINUX.  The  ATPG  system  can  be  used  for  different 
experiments  using  random,  deterministic  and  fault  simulation  algorithms  for  combinational 
circuits  testing.  It’s  planned  to  extend  the  proposed  and  implemented  ATPG  prototype  is  by  a 
conversion  program  from  the  EDIF  format  to  the  ISCAS  input  format  used  in  the  ATPG 
system.  This  work  has  been  supported  by  the  CP94:  0391  UBISTA  project  -  A  Unified  Built 
in  Self  Test  Approach  for  Full  Defect  Testing  in  Mixed  Signals,  and  by  the  VEGA  2/6091/99 
-  Behavioural  and  Real  Defects  Oriented  Test  Generation  for  Digital  Circuits  and  Systems. 
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Abstract:  Trigonometric  functions  are  used  in  many  applications 
including  real  time  digital  signal  processing  (DSP),  navigation  and 
astronomy.  Today’s  demand  for  fast,  small  and  portable  equipment 
in  those  areas  has  resulted  in  the  need  for  optimised  structures  for 
real-time  processing  of  complex  mathematical  functions.  Therefore, 
the  multipurpose  algorithms  used  traditionally  for  such 
implementations  are  investigated  in  this  paper  and  new  directions 
for  future  implementations  are  presented. 


Keywords :  CORDIC  Algorithm,  Low-Power  Design,  High-Level  CMOS  Design. 


1  Introduction 


The  COrdinate  Rotation  Digital  Computer  (CORDIC)  Algorithm  [1]  is  traditionally  used  for  the 
implementation  of  trigonometric  functions.  Voider  first  introduced  the  CORDIC  Algorithm  in 
1959.  Since  then  it  has  been  the  most  popular  algorithm  for  implementing  mathematical  functions. 
With  this  algorithm  only  shift  steps  and  addition  operations  are  required  to  calculate  most 
mathematical  functions.  The  basic  idea  of  CORDIC  is  to  take  an  angle  and  "rotate"  a  vector  over 
this  angle  towards  zero.  The  CORDIC  Algorithm  of  Voider  uses  three  input  variables  (x,  y,  z)  and 
is  based  upon  the  algorithm  shown  in  Figure  1 . 


*„+i  =  xn+dnynTn 

y n+ 1  ~  y n  ~  dnxn  2 

z»+!  =z„+4,arctan2-" 


Figure  1:  The  Rotation  Steps  of 
the  CORDIC  Algorithm 


Values  of  Ihe  ardan  2'1 

1.  Step:  45° 

2.  Step :  26.565° 

3.  Step :  14.036° 

4.  Step :  7.125° 


The  terms  for  arctan  2'n  are  precomputed  and  stored,  n  is  the  index  of  the  iteration  (1,  2, 
3,...,  n)  and  the  value  of  dn  is  either  +1  or  -1.  The  variables  are  initialised.  Next,  a  set  of 
iterative  equations  is  repeatedly  applied  to  these  variables  until  the  result  converges  to  the 
required  accuracy.  The  accuracy  can  be  controlled  using  the  number  of  computing  steps.  If 
accuracy  necessaiy  is  not  the  main  concern,  the  computation  can  be  stopped  after  a  few 
steps.  The  different  functions  are  selected  by  the  value  of  d„.  The  term  dn  is  chosen  in  such 
a  way,  that  with  each  step  either  y  or  z  is  driven  toward  zero.  Figure  1  shows  the  rotation 
steps  when,  that  y  is  driven  toward  zero. 
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2  Implementation  of  the  Arctan 

To  show  the  performance  of  the  CORDIC  Algorithm,  the  trigonometric  function  arctan  was 
implemented  in  hardware.  Additionally,  three  further  solutions  were  developed  and  compared  with 
respect  to  error  deviation,  timing  behaviour,  power  consumption  and  area  requirements.  One 
implementation  for  the  arctan  uses  the  CORDIC  Algorithm.  Two  other  implementations  use  a 
Lookup  Table  and  the  last  one  uses  an  approximation  technique.  The  system  input  has  a  bitwidth 
of  7  bits  and  the  output  is  6  bits  wide.  In  view  to  the  fact  that  the  arctan  function  is  an  odd 
function,  the  sign  of  the  input  value  is  cut  off.  The  sign  can  be  assigned  directly  from  the  input  to 
the  output,  because  of  (1) 

arctan  (x)  =  -  arctan  (-x).  (1) 

The  input  range  from  the  arctan  is  from  -oo  to  +°o.  The  implementation  of  such  a  function  is  only 
possible  by  restricting  the  input  range.  The  input  range  is  restricted  from  0  to  1.984375  with  a  bit 
step  value  of  0.015625.  The  output  is  defined  such  that  60°  corresponds  to  an  output  42  decimal. 
This  results  in  a  resolution  of  1.4286°  at  the  output.  The  input  is  normalised  so  that  an  input  value 
of  1  corresponds  to  an  output  of  60°.  Therefore,  the  input  value  is  multiplied  by  the  factor  ^3.  All 
models  are  implemented  as  synchronous  systems. 

2.1  Using  the  CORDIC  Algorithm 

This  model  uses  the  CORDIC  Algorithm  as  described  in  the  previous  section.  Here,  the  value  of  y 
corresponds  to  the  input  value  for  the  first  stage  and  the  value  of  z  in  the  last  stage  corresponds  to 
the  output  value.  The  design  consists  of  10  pipeline  stages  as  shown  in  Figure  2.  The  required 
shift,  addition  and  subtract  operations  are  done  in  each  stage,  except  the  first  one  were  no  shift 
operation  is  needed.  The  fixed  angle  is  stored  separately  in  the  calculation  stage  and  will  be  added 
or  subtracted  from  z  depending  on  the  value  of 


Figure  2:  The  Structure  of  the  implemented  CORDIC  Algorithm 

The  result  of  y  after  several  calculation  steps  results  in  result  =  arctan  (y/x).  Therefore,  the  value 
of  x  must  be  set  to  I/V3  to  achieve  the  actual  result  of  arctan  (W-y). 

22  Using  a  Lookup  Table 

A  Lookup  Table  (LUT)  is  a  simple  storage  device.  All  output  values  are  precomputed  and  stored  in 
the  LUT.  The  input  is  an  address  in  the  LUT,  which  is  used  to  access  the  output  data.  There  are  no 
calculations.  Therefore,  the  design  is  very  fast.  All  output  values  are  so  as  to  minimise  the  error. 
Thus,  the  error  deviation  can  not  exceed  0.7143°  (half  of  the  resolution).  The  LUT  contains  128 
values,  because  there  are  7  bits  at  the  input  (27=128). 
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23  Modifying  the  LOT 

The  modified  LUT  uses  the  same  precalculated  and  stored  data  as  the  LUT,  except  for  the  first  23 
input  values,  which  will  be  directly  assigned  to  the  output.  The  reason  for  this  assignments  is  that 
for  small  x ,  the  arctan(x)  is  approximately  x.  This  direct  assignment  of  output  to  input  is 
implemented  by  changing  the  resolution  of  both  input  and  the  output.  The  change  in  resolution  at 
the  output  with  the  definition,  that  60°  responds  to  42  decimal  results  in  a  factor  of  1.59.  By 
multiplying  the  now  adjusted  and  normalised  input  value  by  V3,  the  approximation  of  arctan  (x)  - 
x  is  valid  again.  Therefore,  the  value  of  the  input  will  be  assigned  direct  to  the  output  for  input 
values  from  0000000  to  00 1 0 1 1 0. 

2.4  Approximating  the  Arctan 

The  fours  implementation  is  based  on  an  linear  approximation  of  the  arctan,  which  is  optimised  for 
hardware  implementations.  The  characteristic  of  the  arctan  is  divided  in  four  sections  as  shown  in 
Figure  3.  Part  1  is  based  on  the  principle  of  direct  assignment  as  used  in  the  modified  LUT  before. 
Parts  II  to  IV  are  use  simple  equations,  which  represent  the  desired  values.  All  equations  use  a 
multiplier,  which  is  a  multiple  of  2*n  and  one  constant,  which  will  be  added.  Therefore,  it  is  easy  to 
implement  this  equation  into  hardware  by  using  shift  and  an  adding  operations.  The  equations 
shown  in  Figure  3  are  only  optimised  for  this  particular  input  width  and  the  function  arctan  {fix) 
with  the  condition  that  42  decimal  is  responding  to  60°. 


I.  0  -  0.39063 

II.  0.40625-0.84375 

III.  0.84938-  1.25 

IV.  1.26563-  1.984375 


arctan(V3  •*)  =  x 
arctan(V3  -x)  =  2'-x+0.31 
arctan(  >/3  *  * ) =  2‘2-x+0.65 
arctan(  fi .  * )  =  2*3-x+0.9 


Figure  3:  The  four  Sections  for  the  Approximations  of  the  Arctan 


3  Results 

In  this  section  the  four  different  implementations  of  the  arctan  algorithm  are  compared.  The 
designs  were  written  in  VHDL,  synthesised  using  Synopsys  Design  Compiler  without  any 
constraints  using  an  ES2  0.7pm  technology.  All  implementations  were  designed  to  have  a 
deviation  of  less  than  one  bit  from  the  theoretical  value.  All  designs  have  a  propagation  delay  of 
less  than  10ns.  In  respect  of  the  timing  behaviour  there  is  nearly  no  difference  between  the  version 
using  the  Approximations  and  both  Lookup  Table  versions.  However,  the  Approximations 
technique  uses  80%  less  time  for  the  calculation  than  the  CORDIC  Algorithm. 

Figure  4  shows  the  power  consumption  of  the  different  implementations.  The  power 
consumption  was  established  using  PowerCount  [2]  for  an  operating  frequency  of  10MHz  and  a 
supply  voltage  of  5V.  As  can  be  seen  in  Figure  4,  the  Approximation  technique  uses  only  0.6mW 
at  lOMhz,  compared  to  the  CORDIC  Algorithm  which  uses  17mW  a  reduction  in  power 
consumption  by  a  factor  of  25.  The  modified  Lookup  Table  requires  78.5%  of  the  power,  that  is 
consumed  by  the  normal  Lookup  Table.  The  reason  for  this  is  that  the  first  23  of  the  128  input 
values  (from  0000000  to  0010110)  are  directly  assigned  to  the  output.  The  version  using  the 
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Approximation  technique  needs  only  66.7%  of  the  power  of  the  Lookup  Table  and  therefore,  this 
version  is  the  best  with  respect  to  power  consumption. 


Figure  4:  Power  Consumption 


Figure  5:  Area  Requirements 


Figure  5  presents  the  area  requirements  of  the  different  implementations.  Like  the  power 
'consumption,  the  best  result  with  respect  to  the  required  area  is  produced  by  the  approximation 
technique.  This  version  uses  53.5%  of  the  area  required  by  the  Lookup  Table,  79.4%  of  the  area 
required  by  the  modified  Lookup  Table  and  24  times  less  area  when  compared  to  the  original 
CORDIC  algorithm. 


4  Conclusion 

The  aim  of  this  paper  was  to  show  that  the  investigation  of  traditional  algorithms  can  reduce  power 
consumption  significantly.  For  this  purpose  the  arctan  function  was  investigated  and  three 
alternative  implementations  to  the  traditional  CORDIC  algorithm  where  implemented. 

It  was  possible  to  reduce  the  power  consumption  of  the  traditional  implementation  by  a  factor 
of  25.  This  reduced  power  consumption  was  achieved  without  compromising  any  other 
performance  feature  such  as  accuracy  or  throughput.  In  fact,  most  other  performance  parameters 
also  improved.  For  example  the  required  silicon  area  was  also  reduced  by  a  factor  of  24. 
Therefore,  the  authors  have  shown,  that  traditional  multi-purpose  algorithms  may  not  be  optimised 
towards  power  consumption.  In  addition,  there  are  many  possibilities  of  performing  the  same 
operations  with  significantly  reduced  power  consumption,  without  compromising  the  overall 
performance  of  any  aspect  of  the  implementation.  Therefore  it  was  shown  that  it  is  worth  wile  to 
invest  time  and  resources  to  investigate  alternative  implementations  of  traditional  algorithms. 
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Abstract.  Frequency  Synthesisers  are  the  key  to  precise  time,  frequency 
and  phase  in  communications  transmitters  and  receivers  as  well  as  in  radar  pulse 
compression.  Digital  implementations  such  as  direct  digital  synthesis  (DDS)  are 
frequently  selected  because  of  their  inherent  advantages.  Limiting  factors  are  the 
spurious  frequencies,  which  are  generated  due  to  quantisation  effects  and  an 
output  frequency  limited  to  less  than  half  of  the  sampling  frequency  according  to 
the  sampling  theorem  and  taking  into  account  a  realisable  anti-image  filter.  The 
fundamental  part  of  any  DDS  architecture  is  a  periodically  overflowing  phase- 
accumulator  which  generates  linearly  increasing  phase  values.  In  this  paper  we 
review  an  efficient  self-timed  highly-pipelined  accumulator  architecture 
implemented  in  MESFET  Gallium  Arsenide. 


I.  Introduction 

Frequency  Synthesisers  [1]  [2]  are  the  key  to  precise  time,  frequency  and  phase  in 
communications  transmitters  and  receivers  as  well  as  in  radar  pulse  compression.  Digital 
implementations  such  as  DDS  are  frequently  selected  because  of  their  inherent  advantages. 
These  are  as  follows:  any  desired  frequency  resolution  can  be  obtained  merely  by  increasing 
the  number  of  bits;  fast  switching  and  settling  times  and  a  very  wide  tuning  range  are 
available;  the  phase  of  the  synthesised  signal  remains  continuous  whilst  frequency  switching; 
different  modulation  schemes  (FM,  PM  and  AM)  are  easily  implemented;  ideal  I/Q 
decomposition  is  obtained;  cost  is  reduced  due  to  the  absence  of  alignment  requirements; 
improved  temperature  and  aging  stability;  remote  control  capability;  small  size,  mass  and  low 
power  consumption.  Limiting  factors  are  the  spurious  frequencies,  which  are  generated  due  to 
quantisation  effects  and  an  output  frequency  limited  to  less  than  half  of  the  sampling 
frequency  according  to  the  sampling  theorem  and  taking  into  account  a  realisable  anti-image 
filter.  A  periodically  overflowing  phase-accumulator  generates  linearly  increasing  phase 
values.  Using  these  values  to  address  a  ROM,  which  contains  samples  of  the  desired 
waveform,  time  discrete  and  quantised  samples  are  produced.  A  digital  to  analogue  converter 
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(DAC)  in  combination  with  a  low-pass  filter  converts  these  samples  into  an  analogue  signal. 
Fig.  1  shows  the  basic  principle  of  the  Direct  Digital  Frequency  Synthesis. 


Phaea  Look-up  Table  Finds  ROM  Output  DAC 

Truncation  CorrproesJon  YVordtooyth  Nolw 

No  too  Noiio  Noise 


Fig.  1.  DDS  principle 

The  following  sections  demonstrate  an  efficient,  high-speed,  low-power  implementation 
of  an  accumulator  in  MESFET  GaAs  using  a  self-timed  approach. 


II.  Self-Timed  MESFET  GaAs  Systems 

Recently  introduced,  Pseudo  Dynamic  Latched  Logic  (PDLL)  [3]  and  Latch  Coupled 
FET  Logic  (LCFL)  [4]  GaAs  logic  families  have  shown  to  be  a  good  compromise  for  high 
'speed  and  low  power  dissipation  for  both  synchronous,  and  asynchronous  systems.  They 
compare  well  with  other  design  styles  in  terms  of  speed/area  and  speed/(area-power)  based 
figures  of  merit  and  are  especially  efficient  for  highly  pipelined  systems.  The  main  advantage 
of  the  latched  structure  is  provided  by  the  feedback  which  ensures  that  the  noise  margin  is 
higher  than  for  a  simple  Direct  Coupled  FET  Logic  (DCFL)  gate.  This  enables  to  use  serial 
connections  of  the  E-type  transistors  in  the  pull-down  section.  Therefore,  in  GaAs  latched 
logic  it  is  possible  to  implement  logic  gates  based  on  the  AND  function  which  gives  more 
freedom  in  the  design  and  leads  to  more  area  efficient  circuits. 


Self-timed  approach  has  been  chosen  for  the  accumulator  in  order  to  eliminate  the  need 
for  global  distribution  of  extremely  high  frequency  clock  signals  with  the  expected  benefits  of 
reduced  power  dissipation  and  inherent  delay  insensitivity.  As  in  the  frequency  synthesiser  the 
timing  is  of  utmost  importance,  the  synchroniser  needs  to  be  used  prior  to  the  D/A  conversion. 


Fig.  2.  (a)  PDLL  cell,  (b)  LCFL  cell,  (c)  A  self-timed  pipeline 


Fig.  2(a)  shows  a  basic  PDLL  cell  while  the  LCFL  cell  for  self-timed  applications  is 
depicted  in  Figure  2(b).  The  self-timed  pipeline  is  depicted  in  Fig.  2(c).  The  Muller-C 
element,  as  can  be  seen  in  Fig.  2,  is  the  fundamental  component  of  the  handshake  path  of  the 
self-timed  pipeline.  In  terms  of  logic  operation,  it  implements  the  AND  function  for  events, 
such  that  if  a  specific  transition  takes  place  at  one  input  and  it  is  coincident  with,  or  followed 
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by,  a  similar  transition  of  the  other  input(s),  then  that  transition  will  be  presented  at  the  output. 
In  conventional  logic  terms  its  function  can  be  described  as: 

Y{i  +  \)  =  Y(i){A  +  B)  +  AB 


Using  an  LCFL  GaAs  gate  this  equation  can  be  implemented  in  the  structure  presented 
in  Fig.  3. 


A 

B 


(a) 


A  B  Y(i+1) 

0  0  0 

0  J  Y(i) 

I  0  Y(i) 

1  1  1 


(b) 


Fig.  3.  LCFL  implementation  of  the  Muller-C  gate. 


III.  The  MESFET  GaAs  Accumulator 

The  accumulator  has  been  designed  using  the  self-timed  approach  and  the  basic  LCFL 
cells.  A  one-bit  adder  cell  consists  of  the  sum  cell  shown  in  Fig.  4(a)  and  the  carry  cell 
.presented  in  Fig.  4(b).  Both  cells  use  the  AND  connections  in  the  pull-down  section  as  this  is 
allowed  in  the  latched-logic  design  style.  The  sum  cell  uses  the  complements  of  the  input  data 
as  these  are  readily  available  as  shown  in  Fig.  2(b). 


Fig.  4.  (a)  The  sum  cell,  (b)  The  carry  cell,  (c)  A  special  memory  cell 


The  accumulator  structure  for  4  bits  is  shown  in  Fig.  5(a).  It  can  be  observed  that  the 
handshaking  hardware  overhead  is  not  significant,  especially  for  higher  numbers  of  bits, 
although  because  of  the  limited  fan-out  of  the  GaAs  gates  some  buffering  might  be  required. 
The  accumulator  contains  one-bit  adder  cells,  and  pre-skew  and  de-skew  sections  consisting 
of  simple  delay  cells.  However,  because  of  the  feedback  present  in  the  accumulator,  special 
memory  cells  have  to  be  employed.  The  memory  cells  ensure  that  regardless  of  the  delay  in 
the  input  data  (which  can  be  asynchronous)  the  accumulator  adds  correctly  the  new  set  of  data 
to  the  current  contents.  The  memoiy  cell,  shown  in  Fig.  4(c),  uses  the  basic  cell  from  Fig.  2(b) 
and  one  Muller-C  cell  to  read  the  output  of  the  adder  and  is  triggered  by  the  Complete  signal 
from  the  adder  cell. 
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IV.  Simulation 

The  accumulator  performance  has  been  assessed  using  Hspice.  Figure  5(b)  shows  the 
waveforms  at  the  output  of  the  accumulator  for  a  4-bit  data  word  0011,  and  Figure  5(c)  shows 
the  relationship  between  the  spread  of  0.6pm  GaAs  MESFET  process  parameters  and  the 
power  dissipation  of  the  circuit.  The  circuit  throughput  is  0.5  Gsps  and  does  not  depend  on  the 
data  word  width.  It  is  expected  that  this  value  will  further  increase  for  the  0.4pm  process. 


Q 

EH 

■MJLj-JI 

LJLs. 

7171 

CZIE 

7T71 

i  to  to  to  *  5a  to  to  to  to  t»n 

lit* 


(b) 


(c) 


Fig.  5.  A  4-bit  version  of  the  accumulator:  (a)  structure,  (b)  output  waveforms  for  001 1  data 
word,  (c)  power  dissipation  as  a  function  of  Vj  spread 


V.  Conclusions 

The  architecture  of  a  self-timed  GaAs  MESFET  accumulator  has  been  presented.  The 
problem  of  feedback  in  the  high-speed  pipeline  has  been  resolved.  The  circuit  is  characterised 
by  high  speed,  low  power  dissipation  and  inherent  delay-insensitivity.  It  is  specifically 
designed  for  the  application  as  phase  accumulator  for  the  Direct  Digital  Frequency 
Synthesiser. 
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Abstract:  The  need  for  low-power  embedded  processing  cores  and 
reduced  EM  noise  is  a  pressing  force  towards  changes  in  system  design. 

This  has  suggested  the  adoption  of  self- timed  systems,  which  consume 
energy  and  produce  noise  depending  on  data.  In  this  paper,  we  analyze 
this  data-dependency  feature  of  self-timed  system  introducing  Delay 
Probability  Graphs.  Delay  Probability  Graphs  prove  to  efficiently  help 
the  designer  and  be  also  useful  when  applied  to  synchronous  design. 

I.  Introduction 

As  suggested  by  the  1997  ISA  roadmap,  in  a  decade  time  CMOS  technology  will  reach  a 
point  where  switching  delays  will  be  less  important  than  interconnection  ones.  In  such  a 
scenario,  designers  will  face  the  daunting  task  of  designing  complex  and  extremely  wide 
clock  routing  networks.  Thus,  the  clock  signal  will  no  longer  be  able  to  travel  from  one  side 
to  the  other  of  the  chip  in  one  cycle.  This  will  affect  complexity  of  the  design  process  from 
architectural  to  layout  level  and  limits  performance  improvements.  A  natural  solution  to  this 
problem  is  to  adopt  local  communication  avoiding  global  clocks  (i.e.  self- timed  computing 
[1]).  This  approach  may  be  also  effective  in  reducing  power  demand,  electromagnetic  noise, 
timing  faults  and  so  on.  Asynchronous  circuit  design  has  been  recently  used  and  tested  in  few 
processing  core  designs:  these  cores  are  somehow  simple,  concentrating  their  attention  on 
small  ISA  subsets  or  on  the  ARM  embedded  core. 

In  this  paper,  we  focus  our  attention  on  the  data-dependent  behavior  of  such  asynchronous 
systems:  the  delay  of  a  unit  depends  on  both  the  input  data  and  the  specified  operation.  This 
property  allows  variable  delays  for  complex  units  like  ALUs  and  FPUs,  with  possible 
reduction  of  overall  execution  time  respect  the  synchronous  discipline.  The  data-dependent 
behavior  is  analyzed  through  Delay  Probability  Graph  and  applied  to  the  design  of  an 
asynchronous  32-bit  ALU  core  as  part  of  a  larger  processor  design  [5].  The  paper  is  organized 
as  follows:  section  2  formally  describes  the  Delay  Probability  Graphs.  In  section  3,  DPGs  are 
applied  to  the  design  of  the  asynchronous  ALU  and  different  circuits  are  analyzed.  The  final 
design  is  described  in  section  4.  Some  conclusions  are  drawn  in  section  5. 

n.  Delay  Probability  Graphs 

In  order  to  formally  describe  Delay  Probability  Graphs  some  definitions  are  needed.  Given 
a  general  unit  S'  with  n  inputs  X  and  m  outputs  7,  we  define  input  pattern  a  generic  input  X , 
input  set  1  all  the  possible  input  patterns  and  output  pattern  a  generic  output  Y=S(X).  We 
indicate  with  A(X)  the  delay  needed  to  fully  produce  the  output  pattern  Y  corresponding  to  an 
input  pattern  X.  We  define  delay  set  insisting  on  a  interval  [ti&l  the  set  DS  such  that: 
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X €  DS  O  XG  /  a  A(x)  G  [f|_ ,  t2 ]  (1) 

We  call  dimension  a  function  dim(DS)  which  returns  the  number  of  elements  of  DS.  Given  a 
group  of  n  delay  sets  DSi,  it  is  a  uniform  covering  of  the  input  set  if  and  only  if  the  following 
statements  are  true: 


Fig.l:  Comparative  Delay  Probability  Graph  for  8-bit  integer  Ripple  Carry  Adders. 

VlJ  t2i  ~~  =^2j  ~  j 

Vi,  j  i*  j  <=>  DS(  n  DSj  =  0  (2) 

I=\JVSj 

A  Delay  Probability  Graph  (DPG)  for  a  unit  S  is  defined  as  a  bar  representation  of  one  of 
its  possible  uniform  coverings.  Each  bar  i  has  dimension  dim(DSi)  and  x-axis  relative  to  the 
interval  DSi  insists  on.  An  example  of  DPG  for  two  8-bit  ripple  carry  adders  is  presented  in 
Fig.l.  It  is  evident  that  a  DPG  is  a  graphical  representation  of  the  input  patterns  percentage 
generating  output  in  a  given  delay  interval,  and  its  shape  depends  on  the  chosen  intervals  as 
well.  Each  dim(DSi)  value  is  proportional  to  the  probability  of  ready  output  within  such  delay. 
DPG  can  be  generated  only  with  complete  knowledge  of  the  unit  S  behavior.  It  allows 
comparison  of  different  modules  on  the  basis  of  their  delay  distribution  and  data  dependent 
behavior. 


m.  ALU  Basic  blocks  DPGs  and  design 

Delay  Probability  Graphs  can  be  generates  only  through  exhaustive  simulation  of  the 
module  under  investigation.  Moreover,  different  algorithms  and  architectures  for  the  same 
module  generate  different  DPGs;  even  different  solutions  for  completion  detection  influence 
the  DPG.  For  this  reason,  a  proper  DPG  extraction  environment  has  been  studied,  which  can 
produce  the  required  data  in  reasonable  time  and  reasonable  precision. 

We  have  generated  DPG  using  the  C/C++  language,  which  allows  timed  execution  of 
C/C++  scripts  describing  the  module  under  investigation.  The  program  and  the  scripts  use 
delays  extracted  from  circuit  simulation  of  the  basic  blocks.  It  also  generates  information 
about  statistics  of  the  probability  graph  and  test  patterns  to  verify  precision  of  those  statistics. 
The  adoption  of  a  programming  language  has  proved  more  efficient  and  faster  respect  to  HDL 
languages  or  exhaustive  circuit  simulation.  It  also  resulted  more  flexible  and  easily 
customizable  to  different  needed  statistics. 

For  our  design,  we  have  modeled  different  schemes  for  addition,  multiplication,  shifting, 
floating-point  rounding,  etc..  Basic  cells  have  been  designed  following  the  standard  cell 
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library  convention  in  0.5um  MOSIS  technology  with  3.3v  supply  voltage.  As  example,  in 
Fig.l  we  report  the  DPG  for  8-bit  ripple  carry  adders,  where  RCA1  is  based  on  a  1-bit  DCVL 
FA  cell  and  RCA2  on  a  2-bit  one.  RCA2  is  easily  recognized  as  a  better  choice,  since  the 
distribution  is  less  wide  and  delays  smaller.  Most  of  the  input  patterns  can  generate  output 
faster  then  the  faster  RCA1  input  pattern. 


ewr 
■  ARRAY 
□  ARRc 


Fig.2:  Comparative  Delay  Probability  Graph  for  8-bit  unsigned  integer  multipliers  producing  16-bit  result. 

In  Fig.2  we  report  the  DPG  for  three  configurations  of  8-bit  multipliers  all  based  on  a  1-bit 
DCVL  FA  cell  and  generating  a  16-bit  result.  The  schemes  being  investigated  are  a  3:2 
Wallace  tree  (WT),  an  array  multiplier  (ARRAY )  and  a  conditional  array  multiplier  (ARRc). 
The  ARRc  configuration  has  been  recently  proposed  as  valid  way  to  improve  data  dependent 
behavior  [2].  The  DPG  in  Fig.2,  confirms  this  argument,  but  at  the  same  time  it  proves  that 
this  configuration  is  a  poor  design  choice  because  of  its  wider  DPG  (i.e.  greater  standard 
deviation).  As  regards  the  WT  and  the  ARRAY  schemes,  their  DPG  is  similar  with  WT  having 
a  slightly  wider  graph.  The  ARRAY  configuration  has  a  smaller  standard  deviation,  which 
corresponds  to  less  data  dependent  behavior.  For  the  ALU  design  we  have  chosen  the  ARRAY 
scheme  because  of  better  behavior  at  slow  speed  and  possible  optimizations  that  reduce  both 
area  and  latency. 

The  DPGs  in  Fig.  1/2  have  been  generated  for  asynchronous  logic,  so  they  are  influenced  by 
the  completion  detection  logic.  The  C/C++  scripts  allow  evaluation  of  DPG  even  when  the 
completion  logic  is  not  used  (synchronous  case).  A  comparative  DPG  analysis  shows  that  the 
completion  logic  has  two  main  effects:  the  best-case  delay  is  increased:  for  example,  in  an 
RCA  the  best  case  with  completion  logic  must  be  greater  than  the  delay  of  the  completion 
logic,  while  it  is  equal  to  two  1-bit  FA  delays  without.  Data-dependent  behavior  is  introduced 
or  amplified  in  schemes  that  are  regular  (e.g.  WT),  while  it  is  reduced  with  intrinsically  data- 
dependent  modules  (e.g.  ripple  carry  adders).  These  results  suggest  that  the  synchronous 
design  could  exploit  more  data-dependent  delays  if  properly  exploited  [3]. 


IV.  The  asynchronous  ALU 

The  block  view  of  the  32-bit  asynchronous  ALU  is  presented  in  Fig. 3.  The  ALU  is 
composed  of  a  ROM  for  opcode  decoding,  an  asynchronous  locking  register  file  with 
dedicated  locking  port  [4],  and  the  Functional  Unit.  Handshake  follows  the  zero-overhead 
PSO  configuration  for  all  stages,  except  the  FU  one.  For  this  unit,  a  slightly  different  choice 
has  been  made  so  to  allow  the  preceding  unit  to  independently  restore  its  status.  This  proves 
to  be  extremely  efficient  with  units,  like  the  FU,  whose  delay  is  not  predictable  a  priori. 
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The  FU  module  has  been  designed  at  circuit 
level  in  order  to  execute  the  unsigned  integer 
subset  of  the  HCP  instruction  set.  The  basic 
blocks  are  a  32-bit  ripple  carry  adder  using  the 
organization  of  RCA2,  and  an  array  multiplier 
generating  a  complete  64-bit  result.  The 
multiplier  follows  the  same  algorithm  of  the 
ARRAY  configurations,  but  it  is  implemented 
with  a  non-pipelined  self-timed  ring  and  a  32- 
bit  RCA2  adder  for  the  upper  double-word 
(i ORM  in  Fig. 6).  The  ALU  has  a  minimum 
latency  of  about  6.5ns  and  a  maximum  one  of 
about  16.2ns;  cycle  time  varies  from  the  best 
case  of  3ns  to  the  worst  case  of  11.6ns.  Energy 
dissipation  strongly  depends  on  both  data  and 
operation. 


FU 


write-back 


decode 

ROM 


GPRF 

2R/1W/1L 


locking 


Fig.  3:  High-level  view  of  the  AW  core 
structure. 


V.  Conclusions 

■  Improving  system  performance  pushes  designers  to  look  for  every  possible  design  path  and 
property.  In  this  paper,  we  have  introduced  the  Delay  Probability  Graphs  or  DPGs  to  analyze 
the  data-dependent  behavior  of  circuits.  DPGs  have  been  here  applied  to  the  design  of  basic 
cells  for  an  asynchronous  32-bit  ALU.  Its  application  to  synchronous  design  is  also  feasible 
and  sensible,  and  it  represents  the  next  step  of  the  current  work. 
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Abstract.  To  enhance  efficiency  in  FPGA  based  rapid  prototyping  of  digital  tele¬ 
communication  applications  we  insert  the  Protocol  Compiler  into  a  proved  design 
flow.  This  paper  focuses  on  domain  specific  modeling  styles  and  synthesis  strategies 
aiming  design  improvement,  changeability,  and  reuse.  We  discuss  two  design  case 
studies:  a  DAB  (Digital  Audio  Broadcasting)  Test  Data  Generator  and  a  Digital 
Audio  SPDIF  Receiver.  The  achieved  results  are  summarized  concerning  design 
quality  and  efficiency. 

1  Introduction 

The  requirements  rising  from  controller  design  for  telecommunication  applications  are 
the  starting  point  of  our  work.  Using  a  proved  rapid  prototyping  environment,  we  were  faced 
with  a  bottleneck  in  the  design  process  as  visualized  in  Fig.  1:  while  the  control  part  might 
occupy  only  15%  of  the  chip  area,  it  often  takes  up  to  75%  of  the  design  and  debug  effort 
[SHM-96].  Therefore,  we  studied  high-level  modeling  styles  and  synthesis  strategies  to 
enhance  efficiency  in  controller  design  for  structured  data  stream  processing. 


CP:  Control  Part 
DP:  Data  Path 

Final  Chip  Area  Design  &  Debug  Effort 

Fig.  1.  Control  Part:  final  chip  area  versus  design  and  debug  effort 

The  paper  is  organized  as  follows.  First  we  discuss  related  work.  Section  3  describes  our 
experimental  rapid  prototyping  design  flow  using  the  Protocol  Compiler  and  outlines  domain 
specific  high-level  modeling  principles.  In  section  4  two  case  studies  are  explained:  the  design 
of  a  DAB  (Digital  Audio  Broadcasting)  Test  Data  Generator  and  the  implementation  of  a 
Digital  Audio  SPDIF  Receiver.  Results  concerning  design  quality  and  efficiency  are  presented. 
We  finish  with  a  summary  and  conclusions. 

2  Related  Work 

A  high-level  approach  for  compiling  and  debugging  structured  data  processing 
controllers,  first  published  in  [SHM-96],  is  based  on  previous  work  by  Seawright  and  Brewer 
concerning  logic  synthesis  from  grammatical  productions  [Sea-94].  These  results  influenced 
the  development  of  the  Dali-approach  and  its  integration  into  the  Protocol  Compiler  [Syn98]. 
Various  application  reports  [HoB-98,  Bau-99,  SDF-99]  indicate  an  improved  design 
productivity  exploiting  the  capabilities  of  this  high-level  synthesis  approach. 

To  enhance  the  efficiency  in  FPGA  based  rapid  prototyping,  in  the  paper  [FRK-98]  a 
finite  state  machine  (FSM)  partitioning  technique,  which  takes  technology  specific  features 
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into  consideration,  is  inserted  into  the  design  flow.  Based  on  this  previous  work  and  first  expe¬ 
riences  in  utilizing  Protocol  Compiler  in  a  DAB  project  [SDF-99]  this  paper  outlines  high  level 
modeling  principles  and  related  synthesis  solutions  according  to  the  discussion  of  two  case 
studies. 


3  Design  Environment  and  Modeling  Principles 

Our  conventionally  used  design  flow 
starts  at  RT-Level  with  a  VHDL  descrip¬ 
tion  of  the  design  specification.  As  a  new 
element  in  the  chain  of  this  flow,  the  Pro¬ 
tocol  Compiler  is  set  on  top  of  the  whole 
process  (Fig.  2),  by  means  of  which  the 
high-level  specification  is  graphically 
composed.  The  Protocol  Compiler  pro¬ 
vides  the  following  features:  graphical 
protocol  level  entry,  formal  protocol  ana¬ 
lysis,  back  annotation  simulation,  con¬ 
troller  logic  partitioning  and  synthesis, 
and  VHDL  code  generation.  The  graphi¬ 
cal  symbolic  format  (Fig.  3a)  is  similar  to 
the  Backus-Naur-notation.  It  closely 
matches  the  high  level  protocol  specification.  Using  this  specification  facilities  Fig.  3b  shows 
the  embedding  of  DAB  audio  test  data  into  the  DAB  frame  structure  and  a  way  to  expand  the 
subchannels:  To  add  another  module  for  inserting  additional  data  services  into  the  data  stream 
a  sequence  is  created  by  surrounding  the  existing  block  with  a  sequential  frame  operator. 
Afterwards,  the  appearing  unspecified  frame  is  replaced  by  a  copy  of  the  first  module.  Finally, 
a  few  parameters  need  to  be  adjusted. 
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Fig.  2.  Protocol  Compiler  in  an  experimental 
rapid  prototyping  design  flow 
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Fig.  3.  a)  Types  of  frames  and  frame  operators,  b)  Modeling  example 


4  Case  Study  TDG:  Design  of  a  DAB  Test  Data  Generator 

The  digital  radio  system  DAB  is  a  broadband  system,  which  transmits  multiple  programs 
in  a  common  program  block:  the  ensemble  transport  interface  ETI.  Digitized  and  preprocessed 
audio  signals  and  data  services  are  put  together  using  a  multiplexer  (DAB-MUX,  Fig.  4). 
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This  multiplexer  generates  a  complete  ETI 
►  data  stream  out  of  the  supplied  input  streams. 
The  stream  is  fed  to  a  modulator  and  finally 
the  modulated  HF  carrier  is  sent  out  via 
antenna.  The  ETI  data  stream  is  a  structured 
compilation  consisting  of  three  hierarchically 
subdivided  parts:  the  synchronization  block, 
the  LIDATA,  and  the  FRPD  field.  In  a  DAB 
project  our  institute  is  responsible  for  the 
DAB  multiplexer  design.  Consequently,  a 
suitable  TDG  is  a  useful  facility  to  support 
design  validation.  The  TDG  can  be  used 
either  as  an  MUX  input  signal  source  or  to 
emulate  the  MUX  output. 

Now  we  discuss  the  ways  to  generate  the  hierarchical  frame  structure  as  visualized  in  Fig.  5. 
There  are  three  basic  approaches  to  obtain  the  needed  clock  cycle  shift.  The  first  one  starts  ail 
branches  in  parallel  and  delays  the  actions  by  placing  counters  right  before  them.  To  obtain  the 
desired  timing,  it  is  necessary  to  work  in  parallel  because  some  variables  need  to  be  prepared 
even  in  the  clock  cycle  before  they  will  be  serialized.  The  next  possibility  employes  Runldle 
frames  [Syn-98].  The  third  way  is  the  sequential  processing.  It  consequently  sequentializes  the 
design  as  shown  in  Fig.  5.  The  successive  action  will  be  started  not  before  its  predecessor  ends. 
The  timing  is  guaranteed  by  the  serializers. 
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environment 
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Fig.  5.  Hierarchical  ETI  frame  structure  and  corresponding  high-level  model 


The  results  produced  by  the  three  particular  modeling  styles  are  shown  in  the  bar  charts  in 
Fig.  6.  Comparisons  were  made  concerning  the  number  of  generated  VHDL  statements,  the 
number  of  CLBs,  the  circuit  delay,  and  the  total  CPU  time  needed  for  the  whole  synthesis  pro¬ 
cess.  We  implemented  the  test  data  generator  using  sequential  processing.  It  fits  into  a  Xilinx 
FPGA  XC4013E  using  421  out  of  576  CLBs  and  provides  two  independent  data  channels  for 
audio  data  and  for  digital  data  services. 
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Fig.  6.  Synthesis  results:  using  (1)  counters,  (2)  Runldle  frames,  (3)  sequential  processing 
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5  Case  Study  DAR:  Design  of  a  Digital  Audio  SPDIF  Receiver 


FPGA 


Fig.  7.  Hardware  Environment 


Fig.  8.  High  Level-  versus  RTL-Modeling 


To  estimate  the  efficiency  of  our  extended 
rapid  prototyping  design  flow  we  redesigned 
the  Digital  Audio  Receiver  -  a  project  pub¬ 
lished  by  Blinzer  [HoB-98].  This  decoder 
receives  SPDIF  audio  streams  [IEC985]  and 
transmits  audio  data  to  a  digital  analog  con¬ 
verter  (DAC)  over  a  serial  interface  (Fig.  7). 
The  SPDIF  receiver  design  requires  the  sepa¬ 
ration  of  the  SPDIF  protocol  layers:  bit  level, 
block  level,  and  frame  level.  The  output  data 
stream  needs  to  be  reassembled  according  to 
the  DAC  input  protocol. 

To  compare  the  two  styles  (1)  high-level  syn¬ 
thesis  using  Protocol  Compiler  and  (2)  design 
by  means  of  a  graphical  RTL  entry  we  per¬ 
formed  the  steps:  design  entry,  validation, 
code  generation,  and  synthesis  using  FPGA 
Compiler  and  Xilinx  Ml.  Fig.  8  shows  the 
RTL  design  methodology  to  be  less  efficient  in 
terms  of  the  number  of  CLBs  as  well  as  of  the 


maximum  clock  rate.  Furthermore,  the  overall  design  and  debug  effort  is  reduced  by  approxi¬ 


mately  the  factor  two  in  case  of  the  high-level  approach. 


6  Conclusions 

We  have  presented  high-level  modeling  principles  and  related  synthesis  results  achieved 
for  two  design  projects.  We  utilized  a  FPGA  rapid  prototyping  design  flow  which  we  extended 
by  a  graphical  high  level  design  entry.  This  approach  supports  an  application-oriented  model¬ 
ing  style  at  the  level  of  specification  and  enhances  design  efficiency  and  quality  for  structured 
data  stream  processing  controllers.  Additionally,  it  results  in  a  quick  design  exploration,  easy 
changes,  and  design  cycle  reduction.  A  further  controller  design  improvement  can  be  expected 
by  utilizing  reuse  methodologies.  Consequently,  our  future  work  aims  at  extending  our  rapid 
prototyping  design  flow  by  inserting  a  library  of  reusable  protocol  templates  and  components 
and  aims  at  applying  it  to  other  projects  in  telecommunications  and  networking  area. 
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Abstract:  A  configurable  core  of  the  8051  pC  is  presented.  The  core  is  fully  consistent 
with  industrial  standards  80C51  and  80C52  regarding  the  instruction  set  and  timing. 
Nevertheless  the  core’s  structure  is  flexible  and  can  be  configured  according  to  user 
requirements.  Such  features  as  instruction  set,  interrupts,  memory  sizes  etc.,  are  easily 
modifiable.  The  core  design  has  been  checked  in  real  applications  and  is  offered  now  as  an 
IP  block. 

I.  Introduction 

Nowadays  a  significant  trend  towards  developing  core-based  (IP-based)  designs  can  be 
observed  in  the  area  of  complex  application  specific  systems  [1],  Rapidly  growing  market  of 
soft,  firm  and  hard  macros  offered  by  many  IP  providers  sustain,  and  even  strengthen  this 
tendency.  IPs  are  integrated  at  different  levels  of  abstraction  (from  system-  to  transistors-level) 
in  the  design  flow.  This  lead  certainly  to  reducing  of  the  time-to-market  and  global  design 
cost. 

HDLs  allows  creating  of  soft  macros,  which  can  be  mapped  onto  various  technologies  e.g. 
standard  cells,  FPGA  etc.  HDLs  are  being  the  most  efficiently  applied  on  the  RTL  now.  The 
reason  for  that  lies  in  the  synthesis  tools,  which  have  not  been  yet  well  adapted  to  higher  level 
of  abstraction.  Very  popular  are  soft  macros  of  microprocessors  and  microcontrollers.  Among 
them  the  805 1  pC  [2]  and  its  mutations  are  still  used,  particularly  in  control  systems. 

At  ITE  an  8051 -like  pC's  core  has  been  developed  on  the  RTL  with  the  VHDL 
synthesizable  subset.  The  core  is  compatible  with  industrial  standards  80C51  and  80C52 
regarding  the  instruction  set  and  timing.  It  is  generic  as  well,  which  means  it  can  be 
configured  according  particular  design  requirements  (e.g.  customized  instruction  set,  interrupt 
priorities,  execution  cycle  etc.). 

II.  A  core  architecture  concept 

The  core  can  be  roughly  divided  into  two  parts:  an  8051  CPU  and  optional  peripheral 
modules.  During  the  CPU  architecture  developing  the  strong  emphasis  was  put  to  make  it 
independent  of  the  peripheral  modules  connected  to.  Optional  peripheral  modules  are  not 
necessary  for  correct  functioning  of  the  CPU  and  its  software  compatibility  (instruction 
execution  times  are  consistent  with  the  industrial  standard).  However  they  are  indispensable  if 
a  fully  compatible  (in  meaning  of  timing)  8051  pC  is  being  built.  Peripheral  modules  are 
attached  to  the  CPU  by  the  use  of  a  SFR  (special  function  registers)  bus,  which  consists  of  an 
address  and  data  buses  and  some  control  signals  (RD,  WR  and  SFR_EN).  Peripherals  are 
accessed  by  the  use  of  special  function  registers  (SFRs).  In  our  approach  they  are  included  in 
the  peripherals.  Upper  half  of  available  addresses  (from  80H  to  FFH)  of  internal  RAM  is  just 
reserved  for  SFRs.  Hence  the  communication  between  the  CPU  and  SFR  is  carried  out  in  the 
same  way  as  in  case  of  any  memory  cell.  Therefore  each  of  the  peripherals  must  include  a 
piece  of  logic  (e.g.  an  address  decoder)  for  handling  the  SFR  bus.  This  allows  the  CPU  to 
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have  the  uniform  access  to  each  SFR.  It  is  worth  to  note  that  two  buses  make  the 
communication  in  the  CPU.  First  one  is  for  internal  and  external  ROM,  the  second  one  for 
internal  and  external  RAM  and  SFRs.  Thus  the  SFR  bus  is  only  a  subset  of  the  common  bus. 

The  CPU  architecture  is  shown  in  Fig.  1.  Four  crucial  functional  blocks  can  be 
distinguished  in  the  core:  an  arithmetic  logical  unit,  an  interrupt  unit,  an  instruction  decoder, 
and  an  execution  controller.  Two  first  modules  have  been  implemented  as  peripheral  modules. 
They  contain  own  internal  SFRs  and  are  connected  to  the  interperipheral  bus  according  to  the 
general  concept  of  the  core  access  to  peripheral  modules.  There  is  also  a  block  of  registers 
used  directly  in  the  CPU  that  contains  as  follows:  an  instruction  register  (IR),  a  program 
counter  (PC),  a  stack  pointer  (SP),  a  data  pointer  (DPTR),  and  a  control  power  word  (PCON). 

The  CPU  interface  has  been  designed  to  enable  to  build  a  pC  that  conforms  industrial 
standards  or  to  use  it  as  a  standalone 
software  compatible  CPU  and  to 
attach  to  it  application  specific 
peripheral,  required.  Peripherals 
connected  to  the  CPU  can  be  either 
characteristic  of  8051  pC  (see  Fig.  2) 
as:  a  serial  port,  a  timer/counter 
module,  parallel  ports  and  external 
memories,  or  any  others  designed  to 
satisfy  the  aim  of  a  single  project. 

All  standard  peripherals  have  been 
also  designed  and  verified  regarding 
compatibility.  Moreover  some  other  non-standard  modules  are  available  in  frames  of  the  core 
as:  a  matrix  keyboard  peripheral,  and  a  7-segment  display  driver. 


As  it  was  already  mentioned  the 
core  has  been  designed  to  be  fully 
compatible  with  industrial  standard. 
Nevertheless  the  core  is  very 
flexible  as  well.  This  is  due  to  its 
generic  structure.  The  core  can  be 
easily  configured  according  special 
requirements.  This  leads  to  the  more 
efficient  use  of  the  core  in  various 
applications  and  reducing  design 
and  fabrication  costs. 

Ail  configuration  capabilities  of  the  core  have  been  grouped  in  a  configuration  package. 
This  allows  for  separation  them  from  the  indigenous  part  of  the  core,  which  remains 
untouched  by  a  user  and  can  be  encoded  to  hide  VHDL  code.  Inside  the  configuration  package 
there  are  several  constants.  By  assigning  values  to  them,  a  user  has  opportunity  to  determine 
the  core’s  structure,  even  the  instruction  set  and  execution  cycle.  How  the  instruction  set  is 
configured  that  is  shown  in  Listing  1.  Each  instruction  being  needless  can  be  eliminated  by 
setting  corresponding  constant  to  zero. 


III.  Configuration  properties  of  the  core 


Fig.  1  Block  diagram  of  the  CPU. 
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constant 

INSTR  CODE  00  EN  CONST 

Int  Bit 

1; 

—  NOP 

constant 

INSTR_CODE_01_EN_CONST 

Int_Bit 

1; 

—  AJMP  addrll 

constant 

INSTR_CODE_FF_EN_CONST 

Int_Bit 

:=  1; 

—  MOV  R7 , A 

Listing  1 .  A  cutting  from  the  core’s  configuration  package  with  settings  of  |iC  instructions. 


User’s  settings  in  the  instruction  set  are  propagated  down  through  the  core’s  hierarchy. 
They  impact  generics  of  instantiated  components.  Let’s  focus  on  the  ALU  now  (look  at  Listing 
2).  Instructions  performed  by  this  module  have  been  divided  into  several  groups.  These 
belonging  to  the  same  group  have  the  same  feature,  which  is  a  piece  of  common  hardware 
needed  by  them  to  be  executed.  The  instruction  of  multiplication  (MUL  AB)  is  switched  on  if 
the  parameter  MUL_EN  is  set  up.  MUL_EN  depends  directly  and  only  on  the  core  parameter 
INSTR_CODE_A4_EN  and  makes  up  one-element  group  -  it  is  carried  out  only  on  operands 
stored  in  the  accumulator  and  register  B.  Another  situation  occurs  in  the  case  of  the  addition 
with  carry  (ADDC).  This  instruction  can  be  performed  on  operands  located  in  different  places 
(in  accumulator,  universal  work  register,  and  memory).  All  varieties  of  this  instruction  need 
the  same  piece  of  hardware  thereby  they  must  be  gathered  in  the  same  group.  A  value  of 
parameter  ADDC_EN  is  dependent  of  several  core  parameters  (INSTR_CODE_35_EN  .  . 
INSTR_CODE_3F_EN).  If  even  only  one  of  them  is  selected  on  in  the  configuration  package, 
the  common  hardware  for  the  group  is  inferred  by  a  synthesis  tool. 


ALU_UNIT:  ALU  generic  map ( 

MUL_EN  =>  --  MUL  A,  B 

INSTR_CODE_A4_EN, 

ADDC_EN  =>  --  ADDC  A,  Rr;  ADDC  A,  #n;  ADDC  A,  ad;  ADDC  A  @Ri 

{ INSTR_CODE_3 8_EN  or  INSTR_CODE_39_EN  or  INSTR_CODE  3A_EN  or  INSTR_CODE_3B_EN  or 
INSTR  CODE  3C  EN  or  INSTR_CODE_3D_EN  or  INSTR_CODeJ3E_EN  or  INSTR_CODE_3F_EN)  or 
INSTR~CODE~34~EN  or  INSTR_CODE_35_EN  or  {INSTR_CODE_36_EN  or  INSTR_CODE_37_EN) , 

) 

port  map 

—  (  signals  mapping  ) 


Listing  2.  Instantiation  of  ALU  component. 

The  INTERRUPT  is  also  an  example  of  a  parameterized  module.  It  can  service  different 
number  of  interrupts  depending  on  the  core’s  application.  The  number  of  interrupts  is 
modifiable  and  the  hardware  priority  as  well.  All  available  interrupts  are  declared  in  the  form 
of  enumeration  type  INT_TYPE  (see  Listing  3).  The  position  of  the  record  is  strictly  tied  to 
hardware  priority.  In  case  of  adding  or  removing  any  interrupt  service  the  INT__TYPE  type 
and  the  INTERR_INFO  constant  are  modified. 

They  are  also  other  configurable  features.  In  the  standard  a  signal  PSEN  (program  store 
enable)  is  active  twice  during  each  machine  cycle  and  in  case  of  a  few  instructions  (e.g.  1-byte 
2  cycle’s  instructions)  some  fetches  are  redundant.  In  the  core  these  needless  fetches  can  be 
removed.  Moreover,  a  user  can  influence  on  how  multiplication  and  division  are  carried  out. 
Setting  a  proper  generic,  a  one-cycle  or  four-cycle  (as  in  the  standard)  multiplication  is 
enabled.  In  similar  way,  the  division  with  reminder  or  fraction  is  set  up.  Another  feature  is 
possibility  to  attach  internal  pure  logic  states  holders.  They  can  be  applied  to  enable  the  bus  go 
to  the  high-impedance  if  any  module  does  not  drive  it.  A  user  can  also  define  types  of  pad 
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cells  (inverting  or  non-inverting)  intended  for  port  pins  and  ALE  and  PSEN  signals.  Thanks  to 
that  a  designer  needn't  care  about  inverters  when  pads  are  attached. 


—Interrupts  available  in  the  8051  microcontroller 
type  Int  JType  (NONE, INT0_T, T0_T, INT1_T, T1_T, UART  JT, T2  JT) ; 

—  Interrupt  setup  record 
type  Int_Info  is  record 

name  :  Int  JType;  —  interrupt  type 

addr  ;  Int_Addr;  —  interrupt  vector 

index  :  Int_SFR_Idx;  —  index  in  "IE"  i  "IP" 
end  record; 

—  Array  of  interrupt  setup  records 

—  Position  in  the  array  corresponds  to  the  hardware  priority  level 
constant  INTERR  INFO  :  Int_Info_Vector  ;= 

((INTO  T,  16#03# ,  0),  —  INTO  (03H)  (IX. 0)  p  t  A  the  highest 

(TO  T7  16#0B# ,  1),  —  TO  (OBH)  (IX. 1)  r  y  /|\ 

(INTI  T,  16#13# ,  2),  —  INTI  (13H)  (IX. 2)  i  I 

(T1  t7  16#1B# ,  3),  —  T1  ( 1BH)  (IX. 3)  o  I 

(UART  T,  16#23# ,  4),  —  OART  (23H)  (IX. 4)  r  I 

(T2_T7  16#2b# ,  5));  —  T2  (2BH)  (IX. 5)  i  I  the  lowest 


Listing  3.  Declaration  of  interrupts. 


IV.  Conclusion 

A  configurable  soft  core  of  the  8051  pC  designed  with  the  VHDL  synthesizable  subset  has 
been  presented  in  the  paper.  The  main  goal  of  efforts  undertaken  while  design  of  the  core,  was 
full  compatibility  with  the  industrial  standards  80C51  and  80C52.  This  aim  has  been  reached 
and  its  structure  is  very  flexible.  The  core  can  been  easily  optimized  regarding  current  design 
constraints  by  changing  generic  parameters  placed  in  the  VHDL  configuration  package. 
Summarizing,  basic  features  of  the  core  presented  are  as  follows: 

•  Possibility  of  attaching  different  peripherals  either  standard  8051  modules  or  developed  by 
customer.  They  can  be  easily  connected  to  the  core  by  a  common  SFR  bus 

•  Peripheral  modules  enclose  the  SFRs  that  makes  the  core  independent  of  them 

•  Full  compatibility  with  the  80C51  and  80C52. 

The  following  properties  of  the  core  can  be  set  up  in  the  configuration  package  to  match 
design  requirements: 

.  Internal  RAM  and  ROM  sizes  are  from  ranges  0..256  Kbytes  and  0..64  Kbytes  respectively 

•  Instruction  set  and  number  of  interrupt  sources 

•  Optional  use  of  internal  bus  holders  for  tri-state  buses 

•  Execution  of  ROM  fetches  (they  can  be  executed  only  if  needed) 

•  Number  of  cycles  for  division  and  multiplication  instructions. 

The  core  has  been  fully  verified  after  implementation  in  standard  cell  technologies.  It  was 
used  in  the  smart  pressure  sensor  chip  and  in  8031 -compatible  pC.  The  compatibility  with  the 
industrial  standards  has  been  checked  and  confirmed  by  the  use  of  the  logic  veiifier  LV500. 
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Abstract 

In  the  past,  Design-For-Test  (DFT)  solutions  (such  as  internal  scan,  built-in  self 
test,  and  boundary  scan),  have  been  seen  to  be  point  solutions  to  individual 
problems.  We  now  know  that  boundary  scan  opens  up  a  re-use  capability  of 
other  DFT  structures  and  this  allows  a  wider  view  of  DFT  and  how  it  fits  into  a 
product  life-cycle  view  of  test.  In  effect,  boundary  scan  has  become  the 
"internet  of  test". 

The  presentation  will  review  the  life-cycle  view  of  DFT  and  comment  on  how 
this  affects  future  developments  of  DFT  technologies. 

1.  Introduction 

Since  the  publication  of  the  ANSI/IEEE  1149.1  Boundary  Scan  Standard  in  1990  [1],  there  has 
been  a  steady  acceptance  of  boundary-scan  technology  by  the  electronics  manufacturing  industry. 
Initially,  the  attention  was  focused  on  the  application  to  board  manufacturing  test,  targeted  at 
boards  with  reducing  access  for  traditional  in-circuit  bed-of-nail  testers,  sometimes  known  as  the 
limited  access  problem.  The  increasing  use  of  ball-grid  array  device  packaging  and  other  "tough- 
to-access"  device  packaging  styles  such  as  thin-shnnk  small-outline  has  accelerated  the  adoption 
of  boundary  scan  and  the  technology  is  now  clearly  in  the  mainstream  phase  of  the  adoption 
cycle. 

More  recently,  In-System  Programming  (ISP)  applications  of  boundary  scan  has  further  increased 
the  interest  in  the  Standard,  with  many  of  the  main  suppliers  of  CPLDs  and  FPGAs  supporting 
the  use  of  1 149.1  as  a  gateway  to  the  programming  and  re-programming  of  their  devices.  The 
advantages  of  ISP  are  many  and  are  discussed  later  in  the  paper. 

It  is  now  clear  however  that  boundary  scan  does  a  lot  more  than  just  solve  board  test  and  ISP 
problems.  A  decision  to  use  boundary  scan  for  either  of  the  above  reasons  usually  opens  up  a 
discussion  about  the  whole  life-cycle  test  requirements  and  test  strategy  for  a  product. 
Effectively,  boundary  scan  allows  re-use  access  to  other  forms  of  test  structures,  such  as  internal 
scan  and  built-in  self  test,  and  InScan  ("scan-through-TAP")  and  RunBist  instructions  are  now 
commonly  required  in  complex  full-custom  devices  [2].  These  types  of  instruction  can  be 
invoked  at  any  time  during  the  life  of  the  product  and  it  has  been  shown  that  they  have 
tremendous  diagnostic  value  during  system  integration  and  field-service  operations.  An  early 
example  of  this  is  the  use  of  boundary  scan  in  the  design  of  the  electronic  systems  in  the 
lridium™/SM  satellites  [3].  A  more-recent  example  can  be  found  in  Stratus  Computer's  fault- 
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tolerant  computers  [4],  and  in  the  proposal  by  NASA  to  build  a  3-dimensional  MCM  stack  space 
flight  computer  [5]. 


Additionally,  we  are  now  seeing  the  implosion  of  boundary-scan  concepts,  from  board  back 
down  to  System-on-Chip  devices.  The  IEEE  P1500  Embedded  Core  Test  standards  acPv.ty  s 
veering  very  close  to  1149.1  structures  with  their  ideas  of  test-access  mechanisms  and 
wrappers  [6]. 


In  effect,  boundary  scan  technology  has  become  the  internet  of  test  [7],  allowing  various  forms  of 
DFT  structures  to  become  connected  for  re-use  purposes.  The  paper  explores  p 

boundary  scan  and  shows  how  the  technology  has  a  role  at  prototype  board  debug,  volume 
manufacturing  of  boards,  system  integration,  and  field  service. 


2.  The  growth  of  PC-based  testers 

To  take  full  advantage  of  boundary  scan  through  the  life-cycle  of  a  product  requires  supporting 
test  technology  that  is  both  versatile  and  portable.  During  the  latter  days  °f  JTAG,  the 
organisation  that  created  the  Standard,  several  companies  began  to  program  PCs  with  the 
boundary-scan  integrity  and  interconnect  algorithms.  Indeed,  the  parallel  printer  port  was  even 
considered  to  be  a  crude  form  of  driver/sensor  channels! 

Nowadays,  there  is  a  variety  of  low-cost  portable  boundary-scan  testers  available,  based  on  the 
PC  as  a  hardware  platform  and  equipped  with  a  wide  range  of  test  capability.  From  a  hardware 
perspective,  the  equipment  usually  consists  of  a  PC,  either  a  PC-Card  or  expansion-slot 
driver/sensor  channel  card,  and  a  buffer  interface  to  the  board-under-test  -  Fig.  1.  On  the 
software  side,  these  testers  come  equipped  to  handle  both  boundary  scan  and  non-boundary-sca 
components  on  the  board,  where  such  components  can  be  accessed  via  boundary-scan 
components. 


3.  In-System  Programming  via  1149.1  Boundary  Scan 

In-System  Programming  (ISP)  of  CPLDs,  FPGAs  and  Flash  memory  devices  though  board-level 
1149.1  boundary-scan  interfaces  has  become  an  important  new  application  of  boundary  scan  [8, 
9,  10].  This  section  reviews  ISP  basics  and  describes  benefits  and  issues. 


ISP  is  the  loading  of  device-configuration  data  into  a  programmable  device  after  the  device  has 
been  assembled  onto  a  board.  ISP  is  also  known  as  on-Board  Programming  or  In-System 
Configuration  (for  FPGAs).  The  programming  interface  to  the  device  is  through  the  1149.1 
board-level  boundary-scan  structures.  Fig.  2  shows  an  1 1 49.1 -compliant  CPLD  (device  2 
connected  to  other  boundary-scan  devices,  along  with  a  Flash  memory  device  (device  4) 
accessible  from  a  boundary-scan  device  (device  3). 


The  advantages  of  ISP,  compared  to  off-line  programming  stations,  are  many: 
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•  simplifies  inventory  management, 

•  reduces  or  removes  the  need  for  off-line  programming  stations, 

•  enables  rapid  prototype  configuration  and  re-configuration,  thereby  increasing  design 
flexibility, 

•  removes  the  need  for  on-board  sockets  which  are  often  a  cause  of  pin  damage, 

.  reduces  risk  of  damage  caused  by  mechanical  handling  and  Electro-Static  Discharge,  and 

•  allows  program  upgrades  for  System  and  Field-Service  debug. 

Fig  3  shows  the  programming  data  flow  for  CPLDs.  Essentially,  ISP  formatting  data  is  passed 
from  the  PLD  device-vendor's  tools  into  a  PC-based  board-test-programming  station  where  it  is 
embedded  in  the  1149.1  access  protocol  for  the  board.  The  programmable  device  is  targeted 
through  the  board  boundary-scan  path.  Boundary-scan  devices  either  side  of  the  PLD  are  placed 
in  Bypass  mode  i.e.  devices  1  and  3  in  Fig.  2.  The  programmable  device  is  loaded  with  whatever 
instruction  is  necessary  to  commence  the  programming  process.  This  instruction  will  first  target 
the  programming  location  and  then  enable  programming  data  to  be  serially  loaded  into  the 
addressed  location. 

In  the  case  of  Flash  memory  devices,  the  initialise,  erase,  write  and  verify  patterns  are  applied 
from  the  boundary-scan  device,  or  devices,  that  has  access  to  the  Address,  Data  and  Control  pins 
of  the  Flash  i.e.  in  Fig.  2,  the  boundary-scan  register  of  device  3  is  used  to  program  the  Flash 
device,  device  4.  In  this  case,  devices  1  and  2  are  placed  in  Bypass  mode  and  device  3  is  loaded 
with  the  Extest  instruction. 

4.  Applications  of  boundary  scan  through  the  life  cycle  of  a  product 

As  acceptance  of  boundary  scan  techniques  grew,  designers  began  to  question  the  re-use  of  DFT 
structures  "above  the  board"  i.e.  during  system  integration  and  especially  during  field  service  [1 1, 
121.  In  the  case  where  the  product  has  multiple  boards,  system  designers  and  field  service 
engineers  required  a  multi-drop  architecture  to  allow  drop-down  access  to  individual  boards,  and 
hence  individual  devices,  through  a  backplane  bus  -  see  Fig.  4.  Eventually,  the  IEEE  *  149.5 
Module  Test  and  Maintenance  Bus  was  approved  in  1996  [13],  but  Texas  Instruments  and 
National  Semiconductors  (now  Fairchild)  had  already  produced  devices  [14,  15]  that  allowed 
1 149.1  to  be  extended  into  the  backplane  domain.  Although  not  as  powerful  as  1 149.5,  the  use 
of  1149.1  as  a  backplane  bus  opened  up  the  possibility  of  DFT  structure  re-use  and  allowed  a 
life-cycle  view  of  DFT. 

Fig.  5  summarises  this  view.  There  are  now  four  major  DFT  technologies:  internal  scan,  BIST, 
IDD0  and  boundary  scan.  Each  technique  provides  a  point  solution  to  a  point  problem.  For 
example,  internal  scan  solves  several  problems  to  do  with  the  test  and  testability  of  devices  e.g.  it 
enables  high  levels  of  defect  coverage  of  semiconductor  manufacturing  defects,  plus  the  internal 
scan  paths  allow  partitioning  to  support  the  difficult  process  of  diagnostics. 

Similarly,  BIST  also  solves  many  problems  of  testing  devices:  memory  BIST  algorithms  can  be 
targeted  directly  at  known  defect  types  allowing  not  only  high  defect  coverage  but  also  an  at- 
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speed  test  to  detect  defects  that  affect  the  high-frequency  performance.  BIST  also  reduces 
problems  of  core  access  through  the  pins  of  a  System-On-Chip  device  and  provides  protection  of 
Intellectual  Property  for  hard-core  cores. 

Iddq  targets  device  defects  which  may  not  be  detected  through  voltage  measurement  and  recent 
work  [16]  has  shown  that  IDDq  signatures  can  be  used  to  help  locate  defects. 

Boundary  scan  is  different  to  the  other  three  DFT  techniques.  Boundary  scan  targets  a  board  test 
problem  but  the  structure  has  to  be  inserted  into  a  device  by  a  device  designer.  In  other  words, 
the  person  who  sees  the  "pain"  is  different  to  the  person  who  sees  the  "gain".  But,  boundary  scan 
opens  up  re-use  of  the  other  techniques  through  instructions  such  as  InScan,  RunBist  and 
providing  access  to  the  Pass/Fail  result  of  internal  Built-In  Current  monitors.  This  means  that 
device  DFT  structures  now  have  additional  value  at  board  level.  They  can  be  re-used  to  assist 
board  diagnostics. 

The  further  development  of  multi-drop  architectures  opens  up  yet  more  re-use,  into  systems 
integration  and  finally  through  to  field  service.  The  true  value  here  is  diagnostics.  Once  boards 
are  assembled  into  racks  and  especially  when  the  product  is  in  use  by  a  customer,  determining  the 
exact  cause  of  a  fault  can  become  extremely  difficult  and  eventually,  very  costly.  So,  the  one 
word  that  dominates  the  right-hand-side  of  Fig.  5  is  "diagnostics". 

The  more-recent  development  of  ISP  has  reinforced  the  role  boundary  scan  plays  in  life-cycle 
test.  We  can  consider  re-programming  programmable  devices  during  system  integration  and 
during  field  service  either  to  fix  a  system  design  problem  or  to  gracefiilly-degrade  the  behaviour 
of  the  product  rather  than  take  it  off-line  in  the  event  of  the  occurrence  of  an  environmental  or 
wearout  defect. 

Finally,  the  new  IEEE  PI  149.4  Mixed-Signal  Test  Bus  Standard  is  nearly  complete  [17,  18]. 
When  it  arrives,  it  will  specify  additional  boundary-scan  cells  on  the  analogue  ports  of  mixed- 
signal  devices.  These  cells  will  behave  digitally  in  ExTest  mode,  thereby  extending  the 
application  of  boundary  scan  to  the  detection  and  location  of  interconnect  defects.  Additionally, 
PI  149.4  will  also  allow  the  application  of  analogue  stimulus  and  measurement  of  corresponding 
analogue  response  through  an  additional  two  test  pins  (ATI,  AT2)  and  internal  analogue  test 
busses  (AB1,  AB2)  -  see  Fig.  6.  The  publication  of  this  new  standard  will  provoke  yet  more 
interest  in  boundary  scan  as  the  "internet  of  test". 

6.  Conclusions 

Boundary  scan  is  here  to  stay.  It  has  opened  up  the  possibility  of  a  life-cycle  view  of  test  through 
the  ability  to  repeatedly  re-use  DFT  structures.  This  paper  has  summarised  these  possibilities. 
For  more  in-depth  discussion,  see  the  forthcoming  paper  [19] 
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Appendix:  ISP  programming  issues  and  programming  languages 
There  are  several  considerations  regarding  in-system  programming: 

•  Given  that  on-board  CPLDs  are  not  configured  prior  to  power-up,  designers  need  to  make 
sure  that  the  board  powers-up  in  a  safe  state.  Boundary  scan  can  help  by  allowing  access  to 
CPLD  pin-locking  pins  (which  determine  the  status  of  CPLD  10  pins),  output-enable  control 
pins  (to  prevent  bus  contention)  and  master  resets/clears. 

.  Some  Flash  devices  require  higher-than-usual  power  supplies  to  erase  and  re-load  memory 
contents.  Check  that  the  power  supply  is  adequate  and  that  it  is  stable  (use  de-coupling 
capacitors  alongside  the  devices  to  smooth  out  glitches). 

.  When  a  board  is  powered  off,  data  in  an  FPGA's  SRAM  is  lost.  This  data  must  be  re-loaded 
when  the  board  is  re-powered.  PC-based  testers  can  be  used  to  provide  the  re-configuration 

•  During  PLD  programming,  care  must  be  taken  to  ensure  that  on-board  devices  driven  by 
programmable  devices  are  not  affected  by  random  values  present  on  the  programmable 
device's  outputs  as  programming  proceeds.  Some  PLDs  are  designed  such  that  their  outputs 

-are  in  a  high-impedance  state  until  programming  is  complete.  If  this  is  not  the  case,  the 
boundary-scan  instruction  High!  provides  a  way  to  hold  PLD  outputs  to  the  high-impedance 
state  during  programming.  PC-based  boundary-scan  management  tools  allow  pre¬ 
programming  macros  to  be  executed  to  reduce  the  so-called  "nearest  neighbour  disturb 

problems. 

•  Some  1149.1  PLDs  are  not  fully  compliant  with  the  1149.1  Standard  e.g.  no  boundary-scan 
register  (Extest  defaults  to  Bypass),  or  no  path  through  from  TDI  to  TDO  (especially  in  some 
FPGAs  where  TDI  is  simply  used  to  access  the  configuration  registers).  Non-compliance 
stretches  the  programming  capability  of  PC-based  testers  and  the  simple  solution  is  to  make 
sure  all  PLDs  are  fully  compliant  with  the  1 1 49. 1  Standard. 

•  Given  that  PLDs  are  programmed  serially  through  a  chain  of  boundary-scan  devices, 
programming  time  might  be  an  issue  compared  to,  say,  the  time  it  would  take  to  program  in 
parallel  through  the  nails  of  an  in-circuit  board  tester.  This  increase  in  time  is  offset  by  two 
factors  in  favour  of  the  PC-based  board  test  programming  system:  the  portability  and  lower 
cost  of  the  PC-based  system.  Portability  especially  means  that  PLDs  can  be  re-programmed 
even  when  the  system  is  in  operational  use  i.e.  field  upgrades. 

There  are  two  popular  programming  formats:  Serial  Vector  Format  (SVF)  and  JAM.  SVF  was 
originally  developed  as  a  format  for  specifying  board  tests  to  be  applied  through  an  1149.1 
infrastructure  and  is  used  by  companies  such  as  ASSET  InterTech  Inc.,  Texas  Instruments  and 
Teradyne  Inc.  It  has  also  been  widely  used  by  PLD  vendors  such  as  Altera,  Lattice  Devices, 
Cypress,  Xilinx  and  Vantis. 

JAM  was  introduced  by  Altera  and  is  more  of  a  programming  language  than  SVF.  Itis  not 
widely  supported  by  other  PLD  vendors,  except  Cypress,  but  it  is  now  going  through  a  JEDEC 
standardisation  process  (JEDEC  Committee  JC-42.1 
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Abstract 

Two  on-chip  supply  current  monitors  for  Iddq/Iddt  testing  of  low-voltage  CMOS 
circuits  are  considered.  The  proposed  monitors  were  fabricated  and  their  significant 
features  are  presented.  Furthermore,  an  experiment  on  the  built-in  supply  current 
monitoring  unit,  integrated  into  a  digital  design  was  carried  out  in  order  to  evaluate  the 
'  .  feasibility  and  the  applicability  of  supply  current  testing  as  a  complementary  approach  to 
conventional  test  methods.  Thus,  both  the  developed  current  monitors  were  implemented 
together  with  an  experimental  digital  circuit  on  a  single  chip  and  prototyped  in  Alcatel- 
Mietec  0.7pm  3.3V  CMOS  technology.  The  experimental  chip  evaluation  results  are 
presented  as  well. 

1  Introduction 

It  has  been  shown  that  some  CMOS  physical  defects,  due  to  process  imperfections, 
usually  escape  logic  testing  because  they  do  not  affect  the  logic  behaviour  of  a  circuit. 
However,  these  ‘hard-detectable’  defects  often  significantly  reduce  the  reliability  of  the 
circuit.  Therefore,  parametric  test  methods  are  used  to  augment  logic  test  and  to  enhance  the 
defect  coverage.  No  doubt  that  testing  is  best  performed  using  a  set  of  test  techniques,  with 
each  method  dedicated  to  detect  a  class  of  defects.  One  of  the  parametric  test  techniques, 
widely  used  to  detect  mostly  short  defects  (GOS),  is  quiescent  power  supply  current 
monitoring  (IDdq  testing)  [l]-[3].  Nevertheless,  the  efficiency  of  Iddq  in  detecting  open  class 
defects  presents  some  limitations  due  to  the  fact  that  these  failures  may  prevent  changes  of  the 
quiescent  power  supply  current.  In  these  areas,  the  transient  power  supply  current  testing  (Iddt 
testing)  [4]-[5]  can  be  conveniently  used.  It  was  considered,  that  a  combination  of  both  the 
current  test  methods  mentioned  above  might  results  in  a  unified  and  promising  current-based 
parametric  test  approach,  offering  high  defect  coverage  in  CMOS  circuits.  However,  the  on- 
chip  implementation  of  this  approach  is  not  easy  task  and  a  dedicated  measurement  hardware 
is  needed  to  perform  both  Iddq  and  Iddt  current  monitoring. 

In  this  paper,  the  design  and  features  of  new  on-chip  current  monitors  for  both  quiescent 
as  well  as  transient  power  supply  current  monitoring  of  CMOS  circuits  are  presented. 
Furthermore,  implementation  of  the  built-in  supply  current  monitoring  unit,  consisting  of  both 
the  monitors,  into  an  experimental  CMOS  digital  circuit  is  considered.  Such  a  current 
monitoring  unit  was  placed  in  the  VDD  power  supply  line  of  the  circuit  under  test  (CUT).  The 
experimental  chip  was  fabricated  through  Alcatel-Mietec  0.7pm  3.3V  CMOS  technology. 
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2  Built-In  Current  Monitors 


luiescent  BIC  monitor 


Firstly,  a  new  quiescent  current  monitor  based  on  a  second-generation  current  conveyor 
CCII+  principle  [6]  was  designed,  fabricated  and  evaluated.  Its  general  scheme  is  shown  in 
Fig.  1.  The  CCII+  plays  a  main  role  in  the  monitor  performance  as  it  provides  two  important 
functions:  it  conveys  the  Idd  current  to  the  output  of  the  CCII+;  and  it  holds  approximately  the 
same  voltage  on  both  its  input  terminals.  Thus,  the  deviation  of  the  Virtual  VDD  from  the 
VDD  power  supply  is  very  small.  Due  to  the  fact  that  the  CUT  obviously  draws  high  transient 

currents  during  switching  actions,  a 


bypass  switch  has  been  connected 
between  the  VDD  and  the  Virtual  VDD 
to  bypass  the  monitor  while  the  CUT  is 
in  the  transient  state.  The  output 
current  of  the  current  conveyor,  which 
is  equal  to  the  Idd,  is  then  compared  to 
the  reference  current  Iref  by  a  current 
comparator  and  the  signal  Flag  is 
produced  as  a  result  of  the  comparison. 
Since  the  comparison  is  performed 
continuously,  control  circuitry  is  added 


Figure  1  General  scheme  of  QBIC  monitor 


to  sample  the  result  of  the  comparison 
at  the  end  of  the  measurement  period. 


After  fabrication,  the  measurement  of  the  prototype  chips  was  performed  and  the  obtained 
evaluation  results  overcame  expectations  in  all  essential  parameters.  The  monitor  offers 
sensitive  current  measurement  in  a  range  of  Iddq  currents  from  50nA  to  600pA.  The  best 
resolution  of  the  monitor  is  InA  at  testing  speed  of  5kHz  that  is  sufficient  value  to  detect  any 
CMOS  defect  that  may  occur.  The  maximum  testing  speed  of  2MHz  can  be  achieved  if  the 
loading  capacitance  is  low.  In  the  whole  operation  range,  the  BIC  monitor  keeps  the  supply 
voltage  degradation  below  50mV  that  should  not  affect  the  performance  of  the  CUT. 


Transient  BIC  monitor 

Then,  a  novel  transient  current  monitor  that  takes  advantage  of  the  metal  layer’s  parasitic 
resistance  was  developed.  It  is  known  that  the  metal  interconnections  between  the  core  of  a 
design  and  its  I/O  pads  always  induce  a  small  parasitic  resistance.  This  small  resistance 
(assumed  around  lQ)  can  be  used  to  sense  the  very  high  transient  supply  currents,  typically 
drawn  by  the  CUT  during  its  switching  actions.  The  transistror  level  scheme  of  the  circuit 
providing  this  idea  is  depicted  in  Fig.2.  The  dynamic  supply  current  Idd  flowing  through  the 
CUT  always  provides  a  small  voltage  drop  across  the  parasitic  resistance  Rmet-  This  voltage 
difference  makes  the  current  mirror  (MP1,  MP2)  unbalanced  that  implies  a  current  Imir  at  its 
output.  The  whole  transient  monitor,  using  the  current  mirror  principle,  consists  of  two  main 
parts:  an  unbalanced  current  mirror  that  mirrors  the  transient  supply  current,  and  circuitry 
providing  the  quantification  of  the  charge  involved  in  the  supply  current.  This  circuitry 
consists  of  a  diode  D,  a  switch  Ms,  a  capacitor  Charge,  and  a  differential  amplifier  A.  The 
general  scheme  of  the  whole  transient  current  monitor  is  shown  in  Fig.  3.  The  high  transient 
peaks  of  the  mirrored  supply  current  Imir  pass  through  the  diode  D  and  charge  the  capacitor 

Cchargc* 
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Initially,  in  monitoring  mode,  the  transistor  Ms  is  switched  off  so  that  the  capacitor  Charge  is 
fully  charged  by  the  transient  supply  current  and  the  resulting  voltage  is  compared  to  the 
voltage  reference  Vrcf  by  the  amplifier.  Then  the  switch  is  closed  to  discharge  the  capacitor 
C charge  and  to  ensure  that  before  each  transition  the  amplifier  input  is  set  to  zero.  The  offset  in 
the  output  current  Imir  requires  the  capacitor  Charge  and  the  switch  Ms  to  be  connected  to  a 
voltage  Voflsct  not  to  provide  current  flowing  through  the  diode  in  the  quiescent  state  of  the 
CUT. 


Figure  2  Current  mirror  principle  of  Idd  monitoring  Figure  3  General  scheme  ofTBIC  monitor 


The  proposed  transient  monitor  was  also  fabricated  and  evaluated.  Due  to  the  very  small 
resistance  of  metal  layer,  the  monitor  is  able  to  measure  very  high  transient  currents  without 
affecting  the  CUT’S  performance.  For  the  assumed  Rmet  value  of  ID,  the  CUT  supply  voltage 
is  lowered  to  3.2V  maximally  for  Idd  currents  up  to  100mA.  Linearity  of  current-to-voltage 
conversion  of  the  monitor  in  a  range  of  Iddt  currents  from  1mA  to  100mA,  for  f=lMHz  and 
the  Cchargc  value  of  0.5pF  is  shown  in  Fig.  4.  Sensitivity  of  the  monitor  in  frequency  domain 
for  different  values  of  the  IDdt  peak  width  is  depicted  in  Fig.  5,  where  the  minimum 
measurable  values  of  transient  current  spikes  versus  testing  rate  for  different  transient  current 
peak  duration  are  illustrated. 


Ipeak  [mA] 


Figure  4  Linearity  of  current  measurement 


p--tpeak-20ns  — tpeak=50ns  — tpeak=100ns~j 


fpeak  [kHz] 


Figure  5  TBIC  sensitivity  versus  testing  speed 
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3  Implementation  of  the  BIC  Monitoring  Unit  (BIC-MU) 

Finally,  in  order  to  verify  feasibility  and  performance  of  Iddx  current  monitoring  in  the 
real  circuit  testing,  both  the  current  monitors  were  integrated  into  a  unified  built-in  supply 
current  monitoring  unit  (BIC-MU)  that  was  consequently  implemented  together  with  an 
experimental  low-voltage  digital  circuit  on  a  single  chip  and  processed  in  Alcatel-Mietec 
0.7pm  CMOS  technology.  A  digital  two  parallel  8-bit  inputs  multiplier  was  used  as  the  digital 
CUT.  The  chip  area  of  the  digital  circuit  itself  is  850pm  x  850pm.  The  BIC  testing  system  is 
inserted  in  the  VDD  supply  line  and  the  parasitic  resistance  of  the  power  supply  routing  is 
used  to  sense  the  transient  supply  current.  The  size  of  the  whole  IDDx  monitoring  unit  is 
0.24mm2  that  takes  around  24%  of  the  total  chip  area. 

Feasibility  of  the  proposed  IDdx  current  monitoring  unit  were  consequently  proven  by 
evaluation  results  obtained  by  measurement  of  the  experimental  circuit.  Various  static  and 
dynamic  measurements  were  performed  to  investigate  the  performance  of  the  prototype  chips. 
Evaluation  results  confirmed  that  both  transient  as  well  as  quiescent  BIC  monitor  performs 
well  in  conjunction  with  the  CUT.  Quiescent  supply  current  of  the  CUT  at  room  temperature 
monitored  by  QBIC  monitor  was  50nA.  QBIC  also  clearly  indicates  that  CUT  supply  current 
goes  up  as  clock  frequency  of  the  CUT  increases,  which  is  natural  for  digital  circuits.  It  was 
proven  that  the  transient  supply  current  peaks  drawn  by  the  experimental  CUT  depend  on 
amount  of  the  logic  flipped  by  the  system  clock.  Although  the  CUT  is  a  relatively  small  digital 
circuit  that  generates  transient  supply  current  peaks  a  few  tens  of  nanoseconds  wide  and 
several  mA  high;  and  the  detection  of  such  small  peak  variations  is  on  the  margin  of  TBIC 
sensitivity,  the  maximal  detectable  peak  frequency  of  5MHz  was  achieved. 

4  Conclusion 

A  new  on-chip  Iddx  current  monitoring  system  offering  high-speed  and  accurate  Iddq/Iddt 
measurement  was  developed  and  implemented  in  an  experimental  digital  circuit.  The  current 
monitoring  unit  is  able  to  measure  a  wide  range  of  supply  currents  without  affecting  the  CUT 
performance.  Evaluated  parameters  of  the  developed  current  monitors  indicate  the  very 
promising  possibility  to  use  on-chip  current  monitoring  in  real  test  applications. 

Acknowledgement 

This  work  has  been  supported  partially  by  the  EC  in  the  frame  of  the  COPERNICUS  Pioject 
UBISTA  (COP94  :  0391)  and  by  the  Ministry  of  Education  of  the  Slovak  Republic  under  Grants  No.: 
1/6096/99  and  No.  1/4294/97. 

References 

[1]  W.  Mao,  R.  K.  Gulati,  D.  K.  Goel  and  M.  D.  Cilleti,  "QUIETEST:  A  Quiescent  Current  Testing 
Methodology  for  Detecting  Leakage  Faults",  Proc.  of  Inter,  on  CAD,  1990,  pp.  280-283. 

[2]  C.F.  Hawkins  and  J.M.  Soden,  "Electrical  characteristics  and  testing  consideration  for  gate  oxide 
shorts  in  CMOS  ICs",  Proc.  of  The  1985  Test  Conf,  Philadelphia,  PA,  1985,  pp.  544-555. 

[3]  W.  Maly  and  M.  Patyra,  "Built-in  Current  Testing",  IEEE  Journal  of  Solid  State  Circuits,  Vol.  27, 
No.  3,  March  1992,  pp.  425-428. 

[4]  S-T.  Su  and  R.Z.  Makki,  "Testing  of  SRAMs  by  Monitoring  Dynamic  Power  Supply  Current  , 

JETTA,  Vol.  3,  1992,  pp.  265-278.  ,  _ 

[5]  S-T.  Su,  R.Z.  Makki  and  T.  Nagle,  "Transient  Power  Supply  Current  Monitoring  -  A  New  Test 
Method  for  CMOS  VLSI  Circuits",  JETTA,  Vol.  6,  February  1995,  pp.  23-43. 

[6]  V.  Stopjakova  and  H.  Manhaeve,  "CCII+  Current  Conveyor  Based  BIC  Monitor  for  IDdq  Testing 
of  Complex  CMOS  Circuits",  ED&TC’97,  Paris,  France,  March  17-20,  1997,  pp.  266-270. 


2nd  Electronic  Circuits  and  Systems  Conference 
September  6-8,  1999,  Bratislava,  Slovakia 


69 


IOCIMU-2:  An  Integrated  Off-Chip  IDDQ  Measurement  Unit 

M.Sidiropulos1,  B.Straka1,  M.Svajda2,  H.Manhaeve3,  and  J.Vanneuville1 


0  CEDO,  Brno 
Czech  Republic 
Phone:+420-5-43  125  412 
Fax:  +420-5-43  125  307 
sidiro@cedo.cz 


2)  Technical  University  of  Brno 
Czech  Republic 
Phone:  +420-5-43  167  103 
Fax:  +420-5-43  167  298 


3)  KHBO,  Oostende 
Belgium 

Phone:  +32  59  508  996 
Fax: +32  59  704  215 
manhaeve@micro.khbo.be 


Abstract 

The  implementation  of  a  second-generation  integrated 
off-chip  IDDq  monitor  is  presented  in  this  paper.  The 
monitor  can  be  incorporated  into  standard  automated 
test  equipment.  The  monitor  can  operate  at  the  test  rates 
up  to  30kHz  and  offers  a  high  resolution  of  JOnA.  It  is 
capable  of  driving  a  2pF  capacitive  load  and  can 
perform  measurements  of  the  1DDq  in  the  0-1  mA  range, 
which '  enables  to  test  complex  ASIC’s.  The  on-chip 
integrated  bypass  switch  with  the  R0u  resistance  of  only 
0.30  is  capable  of  handling  DUT  transient  currents  up 
to  several  amps.  The  IOCIMU-2  monitor  was  fabricated 
in  the  2- pm  Mietec  BiCMOS  technology  and  has  an 
active  chip  area  of  10  mm2. 

1  Introduction 

Iddq  testing  becomes  common  during  last  years  and 
is  widely  used  as  a  supplement  to  the  functional  tests  to 
enhance  the  test  quality.  It  is  useful  for  detection  of 
physical  defects,  such  as  gate-oxide  shorts  and  bridging 
defects,  which  result  in  an  increase  of  the  quiescent 
current  consumption  of  the  affected  circuit.  The 
effectiveness  of  the  IDDQ  test  depends  on  the  availability 
of  a  suitable  measurement  device,  the  test  rate,  the 
accuracy  of  the  measurement  and  the  quality  of  the  test 
vector  set  used  [1]. 

Basically  two 
approaches  to  the  IDdq 
measurement  can  be 
distinguished:  on-chip 
and  off-chip 

measurement.  On-chip 
IDDQ  monitors  are 
integrated  with  the 
circuit  under  test  on 
the  same  silicon.  On- 
chip  monitors 

generally  exhibit  very 
high  resolution  and 
test  rate  [2].  On  the 
other  hand  on-chip 
monitors  occupy  a 
valuable  area  on  the 


chip  and  require  extra  10  pins.  On-chip  monitors  are 
often  a  part  of  built-in  self-test  applications  [II]. 

In  general,  the  use  of  off-chip  IDdq  monitors  (OCM) 
offers  more  versatility  and  they  can  be  easily  combined 
with  the  automatic  test  equipment  (ATE).  The 
development  of  off-chip  IDDq  measurement  hardware 
has  been  intensively  treated  by  research  groups  during 
the  last  years,  and  it  is  now  becoming  either 
commercially  available  as  add-on  units,  or  as  an  ATE 
option  [3-9].  Table  1  shows  characteristic  parameters  of 
the  discussed  monitors.  Most  of  the  existing  monitors 
are  designed  using  discrete  components.  An  OCM 
realisation  based  upon  the  use  of  well-selected  discrete 
elements  benefits  from  the  possibility  to  exploit  the 
availability  of  high  performance  components  to  realise 
each  of  its  building  blocks.  A  drawback  of  a  discrete 
realisation  is  the  space  taken  up  by  the  board.  This  may 
cause  difficulties  to  place  the  monitor  on  the  load  board 
close  to  the  DUT  and  not  to  obstruct  handlers. 

A  monolithic  OCM  can  be  placed  easily  on  the 
loadboard,  close  to  the  DUT.  Such  a  configuration  is 
part  of  the  QTAG  standard  proposal  [10].  A  first 
attempt  to  design  a  monolithic  monitor  was  done  in 
Philips.  However,  their  IDUNA-2  monitor  [7]  was 
designed  to  test  a  specific  class  of  circuits.  It  has 
featured  a  good  performance  but  only  for  a  small  CL. 


BOC 

[5] 

OCIMU 

[6] 

DOCIMU 

[3] 

[8] 

IDUNA-2 
(7]  ... 

IOCIMU 

(4) 

IOCIMU-2 

Technology 

Discrete 

discrete 

discrete 

CMOS 

1.5pm 

CMOS 

0.8pm 

BiCMOS 

2pm 

BiCMOS 

2pm 

Test  rate 

250kHz 

10kHz 

30kHz 

20kHz 

50kHz 

30kHz 

30kHz 

Iddq  range 

25pA 

IpA-lmA 

1mA 

500pA 

1mA 

1mA 

Max. 

Resolution 

lOOnA 

800nA 

50nA 

lOpA 

60nA 

10nA 

Max.  Coec 

1-4nF 

. M . . 

100n-2pF 

Fixed 

Fixed 

20n-2pF 

20n-2pF 

Ts/Cdec 

Ips/InF 

100ps/1pF 

1 00ps/1  pF 

50ps/1nF 

1 00ps/1  pF 

1 00ps/1  pF 

Method 

l-t 

l-V 

l-V 

l-t 

l-t 

l-V 

l-V 

Bypass  Ron 

0.05ft 

0.6ft 

0.3ft 

Area 

10mm2 

1mm2 

20mm2 

10mm2 

Table  1  Review  of  off-chip  monitors 
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This  paper  deals  with  the  design  of  a 
general-purpose  semi-digital  monolithic  ofF- 
chip  IDDq  monitor.  The  IOCIMU-2  is  intended 
to  be  a  part  of  the  test  board.  Due  to  a  small 
package  and  minimal  number  of  external 
components,  it  can  be  placed  very  close  to  the 
DUT  supply  pins.  IOCIMU-2  can  also 
become  a  part  of  the  test-head  of  ATE.  Thus 
the  monitor  can  measure  IDDq  of  final  single 
chips  as  well  as  wafers. 

2  IOCIMU-2  Architecture 

The  architecture  of  the  IOCIMU-2  has 
evolved  from  its  discrete  predecessor  the 
OCIMU  [6],  and  from  a  more  recent 
monolithic  prototype,  the  IOCIMU  [4],  Fig.  2  The  structure  of  IOCIMU-2 

fabricated  and  tested  in  1997.  The  monitor  is 

placed  between  the  positive  power  supply  and  There  are  ajs0  four  supplementary  units:  control  unit 

the  DUT.  The  hierarchical  structure  and  principal  (CU),  current  bypass  unit  (CBU),  overcurrent  unit 
scheme  of  the  IOCIMU-2  is  shown  in  Figure  1  and  2,  (OCU),  and  sample-and-hold  unit  (S/H).  CU  realizes  the 

respectively.  interface  with  the  intelligent  ATE  to  control  the  internal 

The  monitor  is  designed  as  a  semi-digital  monitor.  operation  of  the  monitor.  The  CBU  is  a  charge- 

The  performance  of  the  monitor  can  be  optimised  for  feedthrough  compensated  control  circuit  associated  with 

various  capacitive  loads  and  measurement  speeds  by  power  NMOS  transistor  switch.  The  purpose  of  the 

means  of  programmable  low-pass  first-order  filters  (LP)  CBU  js  t0  foypass  the  monitor  during  changes  in  DUT 
and  the  programmable  gain  of  sensing  op-amps  (OA1  power  supply  current  when  a  new  test  vector  is  applied, 
and  OA2).  If  required,  the  CBU  can  steadily  bypass  the  IDDQ 

monitor  without  the  need  for  any  additional  external 
reconfiguration  circuit.  The  OCU  is  activated  when  the 
output  of  the  CMU’s  op-amp  is  approaching  saturation 
and  it  temporarily  supplies  the  overcurrent  to  the  faulty 
DUT. 

The  use  of  S/H  minimises  the  influence  of 
interferences  at  the  VDUt  pin,  which  significantly  affects 
the  measurement  accuracy  in  other  monitor  designs.  For 
the  short  moment  of  the  measurement  the  DUT  is 
supplied  by  the  interference-free  voltage  stored  in  the 
Cs/H- 


Figure  1  IOCIMU-2  structure. 

The  main  part  of  the  monitor,  the  current  measuring 
unit  (CMU),  uses  two  matched-gain  op-amps  in  a 
differential  configuration.  VDUt  pin  provides  the 
reference  for  the  DUT  power  supply  voltage,  which  is 
established  and  supplied  at  the  DUT  pin.  The  measured 
IDDQ  current  is  converted  to  a  voltage  and  amplified.  The  bypass  measurement 

resulting  Viddq  voltage  is  compared  with  the  reference  mode  mode 

pass/fail  level,  given  at  the  Vp/p  pin,  and  the  output 

pass/fail  flag  is  generated  at  the  P/F  pin  by  the  threshold  Figure  3  Typical  IOCIMU-2  measurement  cycle, 

unit  (THU). 
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The  typical  measurement  cycle  of  the  IDDq  monitor  is 
shown  in  Figure  3.  Bypass  period  is  initialised  by  the 
MODE  signal,  during  which  the  CBU  is  activated.  The 
monitor  comes  to  the  measurement  mode  with  the 
trailing  edge  of  the  MODE  signal.  The  required 
measuring  conditions  can  be  set  by  the  serial  interface 
via  the  MODE  and  CLOCK  pin  at  any  time.  The  bypass 
NMOS  is  switched  off  and  the  OCU  evaluates  in  less 
than  Ips  whether  IDdq  exceeds  the  measured  range 
1mA.  If  IDDq  is  higher,  the  monitor  is  returned  to  the 
bypass  mode  until  the  end  of  measurement  cycle. 
Otherwise,  the  CMU  is  activated  to  perform  the  accurate 
Iddq  measurement  in  the  range  under  1mA.  Outputs  are 
valid  at  the  end  of  this  measurement  cycle  (60-  lOOps), 
and  the  ATE  can  scan  the  Pass/Fail  (P/F)  output  flag. 

3  Implemetation  and  Measurement  Results 

The  IOCIMU-2  is  implemented  and  fabricated  in  the 
2-pm  Mietec  high  voltage  BiCMOS  technology  using 
the  Europractice  MPC  services.  The  picture  of  the 
IOCIMU-2  chip  die  is  shown  in  Figure  4.  Better  layout 
technique  allowed  to  reduce  the  active  area  of  the  circuit 
to  10mm2,  which  is  approximately  a  half  of  the  original 
IOCIMU  circuit. 


Figure  4  Picture  of  the  IOCIMU-2. 


The  main  problem  of  the  integration  of  precision 
instruments  in  standard  technologies  is  the  high  absolute 
tolerance  of  device  parameters  due  to  process  variations 
(up  to  20%).  However,  it  gives  an  advantage  of 
excellent  matching  properties  of  components  placed  on 
the  same  substrate.  In  the  ideal  case,  the  I-V  conversion 
ratio  of  the  monitor  is  5.00mV/pA.  Due  to  the  process 
tolerances  the  deviation  of  the  on-chip  sense  resistors  is 
about  5%.  However,  this  deviation  is  compensated  by 
means  of  a  differential  architecture  with  matched 
sensing  resistors,  which  results  in  measurement  error 
under  1%.  If  a  voltage  is  applied  as  a  reference,  an 
additional  error  due  to  V-l  conversion  should  be 


considered.  The  comparator  exhibits  no  significant  DC 
errors. 

The  parameters  and  resolution  of  the  IOCIMU-2  in 
function  of  the  test  rate  and  the  loading  capacitance  CL 
are  shown  in  Table  2  and  Figure  5,  respectively. 


Parameter 

Min. 

Typical 

Max.  ! 

Die  area 

10mm2 

Supply  voltage 

12V 

15V 

18V 

DUT  supply 

0.5V 

3-5V 

7V 

Power  cons. 

36mW 

Iddq  range 

1mA 

1.5mA 

Resolution 

10nA 

50nA 

3pA 

Test  rate 

2kHz 

10kHz 

30kHz 

cL 

20nF 

lOOnF 

. M . 

Bypass  Ron 

0.3Q 

Table  2  IOCIMU-2  parameters. 


Figure  5  Measured  IOCIMU-2  resolution. 


4  Conclusion 

IOCIMU-2  is  a  highly  performing  general  purpose 
current  monitor.  It  features  an  excellent  performance 
for  IDDQ  currents  in  the  range  from  0  to  1mA,  a  loading 
capacitance  between  20nF  and  2pF,  and  for  DUT  supply 
voltages  VDUT  in  the  range  of  0.5  to  7.5V.  A  maximum 
test  rate  of  30kHz,  with  a  resolution  1.2pA  was  reached 
for  the  loading  capacitance  Cl  of  0.1  pF.  Maximum 
resolution  of  lOnA  at  CL=T00nF  and  2kHz  test  speed  is 
an  excellent  value  for  the  IDDQ  off-chip  monitor. 
IOCIMU-2  proves  a  good  piece  of  IDDq  test  hardware. 
Therefore  it  is  expected  to  be  used  as  a  real  industrial 
product,  which  enhances  capabilities  of  testers  with  all 
the  benefits. 
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Abstract.  When  monitoring  circuit  temperature,  measurements  taken 
from  temperature  sensors  are  usually  corrupted  with  noise.  Since  inverse 
problems  associated  with  temperature  estimation  are  sensitive  to  errors, 
special  techniques  must  be  used  to  assure  good  quality  of  heat  source 
temperature  estimates.  In  this  paper,  the  adaptive  QR-RLS  algorithm 
is  employed  to  produce  robust  dissipated  power  density  estimates. 
Numerical  simulation  results  for  the  new  approach  are  provided. 


I.  Introduction 

Many  modern  applications  require  continuous  on-line  monitoring  of  circuit  temperature. 
One  of  the  most  popular  methods  of  such  monitoring  uses  remote  temperature  sensors,  e.g. 
p-n  junctions,  placed  on  the  IC  surface.  The  main  drawback  of  this  approach  is  that 
measurements  from  temperature  sensors  are  corrupted  with  noise  and,  since  the  inverse  heat 
conduction  problem  (IHCP)  consisting  in  estimating  source  temperature  is  extremely  sensitive 
to  errors,  it  is  necessary  to  filter  noise  out.  More  information  on  IHCPs  can  be  found  in  [1]. 

The  most  common  method  used  to  improve  the  estimation  quality  consists  in  placing 
redundant  temperature  sensors,  i.e.  more  sensors  than  heat  sources,  rendering  the  system 
overdetermined.  In  our  simulations,  we  include  an  additional  filtering  stage  implementing  the 
inverse  QR  decomposition  based  recursive  least  squares  (QR-RLS)  adaptive  algorithm 
to  produce  improved  dissipated  power  density  estimates.  When  the  dissipated  power 
is  estimated,  it  is  possible  to  determine  temperature  distribution  in  the  whole  circuit  solving 
a  direct  heat  conduction  problem.  The  QR  decomposition  is  performed  using  the  Givens 
rotations. 

II.  QR-RLS  Adaptive  Algorithm 

Adaptive  filters  are  digital  filters  with  adjustable  coefficients.  Their  operation  relies 
on  an  iterative  algorithm  used  to  modify  digital  filter  coefficients.  Filters  of  this  kind  are 
essential  if  the  desired  signal  is  corrupted  with  noise  varying  in  time  or  occupying  unknown 
spectral  band.  In  stationary  environments,  algorithms  converge  to  the  optimal  solution  or, 
in  non-stationary  environment,  they  track  the  signal  characteristic  changes.  Thus,  an  adaptive 
filter  used  to  filter  out  noise  from  temperature  sensor  signal  appears  to  be  an  appropriate  tool 
for  dissipated  power  estimation  purposes. 
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There  exists  a  wide  variety  of  adaptive  filter  algorithms.  The  choice  of  a  particular 
algorithm  is  based  on  its  characteristics,  such  as  misadjustment,  steady-state  fluctuation, 
numerical  stability  or  rate  of  convergence.  The  QR-RLS  algorithm  employed  in  the  conducted 
numerical  simulations  belongs  to  the  group  of  square  root  adaptive  algorithms,  in  which 
instead  of  processing  the  input  data  correlation  matrix  its  square  root  is  processed,  hence  their 
name. 

As  mentioned  earlier,  the  QR-RLS  algorithm  is  based  on  the  QR  matrix  decomposition. 
One  of  many  possible  ways  to  perform  the  QR  decomposition  is  to  apply  the  Givens  rotations. 
The  Givens  rotations,  suggested  for  the  first  time  by  Jacobi,  are  used  to  rotate  a  vector 
by  an  arbitrary  angle  ©  without  changing  its  Euclidean  norm.  When  applied  for  the  QR 
decomposition,  the  Givens  rotations  serve  to  annihilate  selectively  chosen  elements  in  input 
data  matrix  changing  only  two  columns  or  two  rows  of  the  matrix.  Performing  several 
successive  Givens  rotations  all  the  required  elements  can  be  brought  to  0  and  the  QR 
factorization  of  the  input  data  matrix  can  be  completed.  For  the  full  description  of  the 
decomposition  technique,  refer  to  [2]. 

The  operation  of  the  algorithm  is  summarized  in  equations  1-3.  With  each  new  arriving 
input  data  vector  Xk  successive  simple  Givens  unitary  rotations  are  performed  on  the  left  hand 
side  pre-array  to  produce  a  zero  block  in  the  top  right  corner  of  the  right  hand  side  post-array 
matrix  (see  equation  1).  The  matrix  ©k  is  the  product  of  simple  Givens  rotation  matrices.  The 
main  disadvantage  of  the  simple  form  of  QR-RLS  adaptive  algorithm  is  that  the  weight  vector 
Wk  is  not  given  in  an  explicit  way,  which  limits  the  number  of  possible  applications,  therefore 
the  authors  used  the  inverse  QR-RLS  algorithm.  In  this  version  of  the  algorithm,  instead 
of  operating  on  the  input  data  correlation  matrix  <D,  the  square  root  of  its  inverse  PI/2 
is  propagated.  This  solution  requires  in  turn  the  pre-processing  of  input  data  vector  (see  [3]  for 
details). 
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xl  pf] 

©*  = 
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0 

r'n  px,_] 

Pi'2. 

Kk  =  (Kk  Yk'I/2)  (Yk"I/2)-1  (2) 

Wk  =  Wk.,  +  Kk  £kT  (3) 


where: 

K  -  filter  gain 
W  -  filter  weight  matrix 
0  -  input  signal  correlation  matrix 
k-I,  k,  k+1  -  sampling  instants 
y-  conversion  factor 


P  -  inverse  of  input  signal  correlation  matrix 
X  -  input  data  matrix 
Q~  unitary  transformation 
X  -  memory  parameter;  1  > X>  0 
a  priori  estimation  error 
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III.  Simulation  Results 

In  the  computer  experiments,  an  IC  model  containing  3  heat  sources  and  5  temperature 
sensors  placed  on  its  top  surface  was  considered.  The  main  objective  of  the  simulations  was 
to  investigate  the  possibility  of  improving  dissipated  power  estimates  using  adaptive  filtering. 
Given  the  heat  sources  and  temperature  sensors  configuration,  all  the  coefficients  relating 
sensor  temperatures  to  the  densities  of  power  dissipated  in  the  heat  sources  were  computed 
from  the  analytical  solution  of  the  heat  conduction  equation.  The  detailed  description  of  the 
solution  method  can  be  found  in  [4]. 

The  assumed  power  pulses,  which  change  independently  in  each  of  the  heat  sources,  are 
considered  to  comprise  combinations  of  linear  and  step-like  functions.  Based  on  the  computed 
coefficients,  thermal  responses  at  the  sensor  positions  were  calculated.  Then,  a  set  of  600  test 
input  temperature  samples  has  been  contaminated  with  noise  of  zero  mean  value  and  the 
standard  deviation  of  0. 1  K.  The  generated  data  set  was  used  to  test  the  effectiveness  of  the 
QR-RLS  adaptive  algorithm.  The  number  of  filter  taps  was  equal  to  the  number  of  sensors. 
The  algorithm  memory  parameter  X  was  experimentally  set  to  0.99.  Due  to  the  scarcity 
of  space,  only  the  results  obtained  for  the  hottest  heat  source  are  presented.  The  estimation 
quality  improvement  can  be  assessed  from  Fig.  1-2,  which  show  the  mean  value  and  standard 
deviation  of  the  difference  between  the  estimate  and  the  original  uncorrupted  power  pulse. 
The  dashed  and  dotted  lines  represent  ordinary  LMS  space  averaged  estimate  and  QR-RLS 
adaptive  filtered  estimates  respectively.  The  first  ten  sample  values  required  for  the  algorithm 
to  adapt  have  been  omitted.  The  maximal  value  of  the  power  signal  for  this  particular  heat 
source  was  750  W  /  mm 2. 

As  can  be  seen,  the  estimate  mean  values  are  very  close  to  the  actual  signal  value  (3  % 
of  the  maximal  signal  value).  The  filtered  estimate  standard  deviation  value  dropped  to  less 
than  20  %  of  the  maximal  signal  value  and  was  almost  40  %  smaller  than  for  the  space 
averaged  estimate.  The  estimate  improvement  attained  for  the  other  sources  was  even  better 
(over  50  %  reduction  of  standard  deviation  value). 

IV.  Conclusions 

In  the  light  of  conducted  computer  experiments,  the  QR-RLS  adaptive  algorithm  proved 
to  be  particularly  useful  for  dissipated  power  estimation.  The  obtained  estimate  values  were 
very  close  to  the  actual  power  signal  values.  Since  the  standard  deviation  of  the  adaptive 
filtered  estimate  was  significantly  smaller  than  the  ordinary  space  averaged  estimate,  the 
adaptive  estimates  can  be  considered  more  accurate  and  more  robust 

References 

[1]  M.  N.  Ozisik,  Heat  Conduction ,  John  Wiley  &  Sons  Inc.,  1993 

[2]  H.  Golub,  C.  F.  van  Loan,  Matrix  Computations ,  The  John  Hopkins  University  Press, 
1996 

[3]  S.  Haykin,  Adaptive  filter  theory ,  Prentice-Hall  International,  1996 

[4]  M.  Janicki,  M.  Zubcrt,  A.  Napieralski,  “Application  of  inverse  heat  conduction  methods 
in  temperature  monitoring  of  integrated  circuits”,  Sensors  &  Actuators,  A:  Physical , 
Vol.  71,  pp.  51-57,  Nov.  1998 


Deviation  (estimate-signal)  [W/(mm*mm)]  ^  Mean  (estimate-signal)  [W/(mm*mm)j 


76 


Estimate  mean  -  QR-RLS  algorithm 


Sample  number 

Mean  value  of  the  difference  between  power  signal  and  its  estimate 
Estimate  deviation  -  QR-RLS  algorithm 


Sample  number 


Fig.  2.  Standard  deviation  of  the  difference  between  power  signal  and  its  estimate 
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Abstract  The  paper  describes  design,  construction  and  results 
obtained  with  a  wireless  information  transmission  measurement  system.  The 
system  is  intended  for  the  measurement  of  small  pressure  variations  and 
information  transfer  by  a  wireless  process.  The  active  part  has  the  form  of  an 
absorption  resonance  frequency  meter.  The  measurement  control, 
approximations  computation  and  results  processing  are  performed  on  a  PC. 

The  passive  one-chip  resonant  circuit  has  been  realized  as  a  monolith 
integrated  sensor  in  CMOS  technology. 

1.  Introduction 

For  the  design  of  a  wireless  pressure  measurement  by  a  capacitive  sensor  it  is  possible 
to  utilise,  for  example,  the  absorption  principle.  From  the  measurement  principle  it  follows 
that  a  resonant  circuit  must  be  created  whose  part  is  the  capacitive  pressure  sensor.  The  target 
was  the  design  of  an  automatic  wireless  measurement  system  for  evaluation  of  capacitance 
variations  of  a  pressure  sensor  with  sufficient  accuracy.  Due  to  the  rather  small  Q-factor  value 
it  is  very  difficult  to  detect  the  rather  flat  top  of  the  resonance  response,  or  to  discriminate 
between  the  resonance  itself  and  the  inherent  oscillator  amplitude.  These  difficulties  can  be 
suppressed  by  inclusion  of  a  computer  in  the  measurement  system.  It  performs  the  seeking  of 
an  extreme  by  mathematical  calculation  (zero  value  of  the  derivative  of  a  function).  The 
measurement  system  consists  of  two  parts  -  Fig.  1 .  The  first  part  is  the  passive  resonant  circuit 
whose  capacitance  is  pressure-dependent.  The  second  part  consists  of  several  blocks.  The 
absorption  meter  proper  has  its  output  connected  to  the  computer  PC  through  AD  and  DA 
converters.  The  communication  with  the  computer  flows  through  an  interfacing  device 
(measurement  card)  to  the  respective  converters. 


Fig.  1 .  Measure  system 

The  construction  and  design  of  the  absorption  meter  is  governed  by  the  required 
frequency  variation  of  the  resonant  circuit,  the  adjustable  step  of  frequency  variation  and  the 
Q-factor  of  the  resonant  circuit.  The  Q-factor  is  rather  low  in  single-chip  integrated  resonant 
circuits,  generally  of  the  order  of  units.  Development  of  an  integrated  resonant  circuit  with  a 
capacitive  pressure  sensor  became  the  subject  of  our  interest.  Structures  of  capacitor  pressure 
sensor  differ  in  the  pressure  value  either  absolute  or  relative,  in  the  shape  of  the  diaphragm 
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(circular,  square,  donut)  and  in  technology  (monolithic,  glued).  From  the  point  of  view  of 
precision  and  reproducibility,  the  exact  dimensions  of  the  diaphragm  manufacturing  plays  e 
main  role.  Sensitivity  and  the  pressure  range  are  determined  above  all  by  its  geometry  and 
diaphragm  properties.  Several  techniques  can  be  used  to  realised  precision  diaphragms. 

II.  The  design  and  realization  of  the  wireless  sensor  system 


A.  The  absorption /frequency  meter 

In  the  design  of  the  absorption  meter  parameters  we  have  used  the  parameters  ol  the 
integrated  resonant  circuit  and  the  required  measurement  conditions  as  basic.  These  produced 
a  requirement  for  a  maximum  capacitance  variation  for  the  maximum  5%  measured  phase 
variations  of  the  sensor.  It  follows  from  the  elementary  equations  of  the  resonant  circuit  that 
this  variation  corresponds  to  a  resonant  frequency  variation  of  2.4%  maximum^  With  a 
resonant  frequency  100  MHz  this  means  a  maximum  full-range  frequency  variation  2.4  MHz. 
Further  it  follows  from  the  required  parameters  that  a  minimum  required  number  ot  steps  is 

100  Thus  the  minimum  resolution  of  24  kHz  is  given. 

The  data  from  [3]  were  used  in  the  circuit  design.  A  circuit  with  parameters  satisfying 
the  requirements  was  designed  according  to  Fig.2.  The  basic  circuit  is  a  Colpitts  oscillator  In 
the  design  it  was  necessary  to  adhere  to  the  general  principles  of  high  frequency  design  ru  es. 
In  order  to  minimise  the  sensitivity  to  external  electromagnetic  interference,  the  output  signal 
was  taken  off  from  the  transistor  collector. 


Fig.  2.  The  circuit  connection  of  the  absorption 
meter 


Fig.  3  The  equivalent  scheme  of  the 
sensor  with  parasitic  elements 


B.  Integrated  passive  resonance  circuit 

The  structure  design  was  created  by  the  intended  application.  The  maximum  dimensions 
of  the  whole  integrated  circuit  was  limited  up  to  3x3  mm.  Maximum  resonant  frequency 
required  is  100  MHz.  The  capacitor  area  of  lxl  mm  was  chosen.  The  assumed  active 
capacitance  was  about  5  pF.  A  planar  inductor,  such  as  in  [2],  was  chosen  for  the  design  with  the 
wire  width/gap  ratio  2:1,  the  empirical  mle  can  be  used  to  calculate  its  inductance  L.  Parasitic 
circuit  parameters  of  the  resonance  circuits  play  very  difficult  task  in  the  design.  It  is  necessary 
take  that  ones  into  the  calculations.  The  dominant  part  of  the  parasitic  capacitance  is  located  on 
areas  where  the  aluminium  diaphragm  is  attached  to  the  silicon  dioxide  layer.  The  parasitic 
resistor  is  located  between  the  two  capacitors.  The  equivalent  scheme  of  the  sensor  with 
parasitic  elements  is  shown  in  Fig.3.  CA  designates  the  active  capacitance,  Cm  is  the  parasitic 
inductor  winding-substrate  capacitance,  L  is  inductance  of  the  inductor  and  Rs  its  parasitic  senes 
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resistance,  Rp  is  the  parasitic  resistance  of  the  substrate  between  the  p-n  junction  and  the 
inductor,  Rsp  is  the  parasitic  series  contact  resistance.  MOS  technology  steps  were  available,  for 
the  development  of  the  sensor  structure.  The  standard  technological  process  was  modified  on  the 
samples  realised.  The  polysilicon  layer  was  omitted,  the  diaphragm  was  made  of  aluminium. 
Various  modifications  of  the  sensor  structure  were  realised  on  the  wafer.  The  first  structure  was 
divided  into  units  of  1  mm  length  lumped  into  small  capacitors  in  parallel  of  90x90,  130x130 
and  160x160  pm  geometry.  The  second  were  a  finger  structures,  the  third  were  spiral  structures 
and  the  last  was  a  inverted  structure.  Inductors  were  modified  with  5,  7,  9  windings  in 
aluminium  layer  of  1 .2  Pm  thickness  and  3.6  pm  in  case  of  the  planar  inductor. 

C.  The  PC  interface 

The  suggested  interface  provides  a  communication  through  a  converter  and  I/O  circuits 
of  the  computer.  It  takes  care  of  the  A/D  conversion  for  the  evaluation  of  the  oscillator 
amplitude  Variation,  the  D/A  conversion  for  the  oscillator  tuning  control  signal,  sufficient  A/D 
and  D/A  speed  with  a  minimum  sampling  frequency  80  kHz,  sufficient  A/D  converter  levels 
resolution  (minimum  100).  These  requirements  can  be  fulfilled  by  the  PCA-1208 
multifunction  measurement  card.  The  card  makes  possible  a  simple  parameter  configuration. 
When  installing 

it  into  the  PC  the  only  hardware  setting  necessary  is  to  set  up  the  base  card  address,  to  choose 
the  voltage  range  of  the  D/A  converter  (unipolar  or  bipolar  logic),  determine  the  digital  ports 
transfer  direction  or  the  mode  of  its  software  control.  All  other  functions  are  software- 
controlled.  Its  physical  dimensions  permit  its  installation  even  in  a  notebook  PC. 

III.  The  results  achieved 

A.  Absorption  meter  and  measuring  system 

Frequency  stability  of  the  absorption  meter  oscillator  was  in  the  range  from  10'5  to  10 
in  dependence  on  the  coil  used..  Measurement  of  the  transfer  characteristic  of  the  absorption 
meter  is  important  for  nonlinearity  determination.  The  transfer  characteristics  for  different  coils 
are  presented  in  Fig.4. 


Fig.4  The  transfer  characteristics  for  different  coils  Fig.  5.  The  transfer  characteristic  of  the 

complete  system 

The  transfer  characteristic  of  the  complete  system  is  presented  in  Fig.5.  For  good  accuracy  and 
reproducibility  of  the  measurement  the  correct  position  of  the  measuring  and  measured  coil  must 
be  set,  as  well  as  a  proper  coupling  between  the  coils. 
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B.  One-chip  integrated  resonance  circuits 

Properties  of  the  samples  realised  were  measured  directly  on  the  chip,  some 
measurements  were  performed  on  encapsulated  structures.  For  chip  measurements  needle 
probes,  a  microscope  and  a  RLC-meter  HP  421 7B  were  used.  The  inductance  of  needle  probes 
and  external  wires  of  the  measurement  instruments  was  about  370  nH.  Elimination  of  the 
additional  parasitic  back  electrode-substrate  and  inductor  wire-substrate  capacitance  was 
necessary  for  measurement  of  the  active  capacitance  Ca.  The  dependence  of  the  capacitance  on 
illumination  has  to  be  also  taken  into  consideration.  Measurement  of  the  capacitance  was 
performed  after  cutting  off  mechanically  the  connection  lines  to  the  inductor. 

An  impedance  meter  HP4194A  was  used  to  measure  the  frequency  response  of  the 
resonant  LC  circuits.  These  measurements  were  performed  up  to  40  MHz  on  samples 
encapsulated  in  DIL  16  package.  Parameters  of  a  simple  equivalent  circuit  consisting  of  a  senes 
combination  of  inductance  L,  capacitance  Cc  and  lossy  resistor  Rs  were  calculated.  Example  of 
the  measured  resonant  curve  with  its  parameters  is  shown  in  Fig.6.  Some  photographs  of  the 
realised  structures  are  in  Fig.7. 


H  START  100.000  Kt 

m  STOP  40  000  000.000  He 

freouencv 


Fig.  6.  The  example  of  measured  resonant  curve 
of  normal  structure  9  win/130  pm 


Fig.  7.  Some  photographs  of  the  realised 
structures 


IV.  Conclusion  ... 

The  absorption  meter  and  structures  of  integrated  resonant  circuit  with  a  capacitive 
pressure  sensor  were  designed  and  realised.  The  structures  were  made  in  various  modifications 
of  the  shape,  i.e.  inductor  outside  the  capacitor,  inductor  inside  the  capacitor,  spiral  structures, 
finger  structures.  Parasitics  were  considered  during  the  measurement.  Resonant  frequencies  were 
measured.  The  resonant  curves  show  very  low  Q  (units).  The  absorption  meter  is  connected  to 
the  PC  through  interface.  The  interface  operation  is  controlled  by  PCA-1208  circuit. 

The  system  makes  possible  contactless  measurement  of  the  capacity  variation  {through 
the  resonant  frequency  variations}.  The  peak  of  the  resonant  curve  is  calculated  from  the 
measured  values  using  PC.  The  transfer  characteristics  of  the  complete  system  were  measured. 
Frequency  stability  of  the  absorption  meter  oscillator  is  10  *  to  10  at  operational  frequency  100 
MHz  in  dependence  of  the  coils  used.  PC  is  used  for  the  data  storing  and  processing 
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Abstract.  The  coverification  of  a  given  HW/SW  system  consists  of 
checking  whether  the  implementation  of  the  software  and  hardware  parts  and  their 
integration  fulfill  or  not  some  or  all  the  specification  requirements  of  this  system.  In  the 
case  of  a  distributed  model,  the  SW  and  HW  system  blocks  are  described  respectively 
by  HLL  (High  Level  Language)  and  HDL  (Hardware  Description  Language)  codes. 

When  dealing  with  a  unified  model,  both  SW  and  HW  components  are  implemented  in 
the  same  language  such  as  Java.  In  this  paper,  we  propose  a  tool  that  allows  designers 
to  specify  the  properties  of  their  systems  in  CPL  (Coverification  Properties  description 
Language),  and  performs  the  coverification  by  simulation.  The  engine  of  this  tool  is 
implemented  using  the  Java  programming  language  and  is  mainly  based  on  the 
management  of  threads. 

I.  Introduction 

To  avoid  an  eventual  lengthy  iterative  codesign  process  of  a  given  HW/SW  system  (that  is 
often  a  result  of  an  unsuccessful  integration  of  hardware  and  software  parts),  performing  an  early 
coverification  of  the  HW/SW  integration  before  prototyping  is  strongly  required.  Coverification 
techniques  currently  used  in  the  microelectronics  industry  are  based  on  cosimulation  environments, 
where  SW  and  HW  elements  are  respectively  described  in  HLL  (High  Level  Language)  and  HDL 
(Hardware  Description  Language),  and  simulated  with  specific  simulators.  The  properties  of  a 
HW/SW  system  are  included  in  its  implementation  codes,  and  are  not  separately  expressed.  The 
management  of  these  properties  (such  as  properties  review,  editing,  description,  checking,  etc.)  is 
hence  time-consuming  and  inefficient. 

We  present  in  this  paper  a  tool  for  the  coverification  of  HW/SW  systems.  This  tool  is  thread- 
oriented  and  currently  assumes  a  unified  description  model  of  HW/SW  systems.  The  software  part  of 
the  HW/SW  system  and  its  interaction  with  the  hardware  part  are  described  as  a  set  of  communicating 
threads.  The  system  properties  first  are  described  in  a  CPL  code  (Covcrifi cation  Properties  description 
language),  and  then  converted  to  Java  threads  in  the  objective  of  performing  the  coverification 
process. 

The  rest  of  this  paper  is  organized  as  follows:  Section  II  briefly  reviews  previous  work  related 
to  coverification.  Section  III  presents  our  proposed  thread-oriented  tool  of  coverification.  Section  IV 
illustrates  the  description  of  HW/SW  system  properties  using  CPL  and  their  coverification.  Finally, 
Section  V  concludes  the  paper. 

II.  Related  work 

The  coverification  process  is  tightly  coupled  with  the  cosimulation.  Known  coverification  tools 
such  as  CVE-Seamless  [1],  Eaglei  [2]  and  Ptolemy  [3]  are  mainly  based  on  cosimulation  techniques. 
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The  cosimulation  approach  simulates  software  and  hardware  parts  and  their  interactions,  known  as 
virtual  integration  in  the  sense  that  this  latter  is  made  long  before  the  prototyping  process.  In  the  case 
of  a  unified  model  [4,  5],  only  one  simulator  is  used  to  perform  cosimulation  but  when  dealing  with  a 
distributed  model  [6,  7,  8,  9,  10],  two  or  several  simulators  are  required  depending  on  how  many 
languages  are  used  to  describe  the  software  and  hardware  parts  of  the  HW/SW  systems.  Other  related 
work  presents  application  cases  such  as  the  coverification  of  DPLL  (Digital  Phase  Locked  Loop)  at 
NORTEL  [11],  and  the  HW/SW  coverification  performance  estimation  of  a  24  embedded  RISC  core 
design  at  SIEMENS  [12].  On  the  other  hand,  there  are  few  papers  in  the  literature  that  deal  with  the 
coverification  using  formal  verification  methods  such  as  model  checking  [13,  14]. 

IH.  Multithreading-based  coverification 

When  the  design  of  a  given  HW/SW  system  is  approximately  achieved,  designers  integrate 
them  to  obtain  the  implementation  of  the  whole  system.  The  software  part  contains  the  original  SW- 
code  of  the  system  and  some  data.  The  original  SW-code  encompasses  memory  addressing,  hardware 
interfacing,  and  eventual  functions  decided  after  partitioning  to  be  implemented  by  software.  The 
hardware  part  is  a  set  of  circuit  modules,  registers,  memories,  IP  blocks  (blocks  with  Intellectual 
Properties),  etc.  The  interface  between  hardware  and  software  parts  is  built  around  a  processor  that 
runs  software  codes  and  manages  the  hardware  signals.  The  problem  of  coverification  is  to  verify  if 
this  global  implementation  satisfies  the  specification  requirements  of  the  system?  This  leads  to  check 
concurrently  if  software  and  hardware  implementations  satisfy  their  corresponding  software  and 
hardware  specifications,  and  if  the  HW/SW  integration  respects  the  requirements  of  the  global 
specification.  Due  to  their  complexity  and  heterogeneity,  real  HW/SW  systems  challenge  current 
methods  of  verification  and  simulation. 

The  basic  idea  of  our  coverification  technique  consists  of  mixing  the  cosimulation  process  with 
a  set  of  properties  to  be  checked  in  concurrence  [5].  This  technique  is  summarized  into  four  steps.  The 
first  one  consists  of  organizing  the  software  part,  as  a  set  of  threads.  The  second  step  stores  in 
observable  registers  the  hardware  signals,  which  are  parameters  of  the  HW/SW  properties.  The  third 
step  puts  the  system  properties  in  threads  and  manages  them.  The  fourth  one  finally  starts  cosimulating 
the  system  augmented  by  its  properties.  The  execution  flow  of  these  steps  is  depicted  on  Figure  1. 
Details  of  the  implementation  and  the  description  of  the  system  properties  are  given  in  the  following 
sections. 

IV.  Description  and  coverification  of  HW/SW  properties 

A.  Specification  of  properties  in  CPL 

The  specification  requirements  of  a  HW/SW  system  are  expressed  as  a  set  of  properties 
described  in  CPL.  CPL  is  a  simple  language  that  we  have  developed  specially  for  the  description  of 
HW/SW  properties.  These  latter  will  be  validated  by  the  coverification  process.  The  parameters  of 
coproperties  (HW/SW  properties)  might  belong  to  the  HW  part,  to  the  SW  part,  or  to  both  of  them.  In 
CPL,  we  specify  which  properties  are  concurrent  and  which  can  be  executed  together  sequentially.  For 
instance,  in  the  example  of  Figure  2,  pi  and  p2  are  concurrent  between  each  other  but  p2  and  p3  are 
not.  p2  and  pi  are  independent  in  terms  of  concurrence  and  they  are  treated  as  a  single  property.  The 
rest  of  the  properties  are  considered  by  default  as  sequential  and  they  are  managed  as  a  composed 
property.  We  note  that  CPL  is  strongly  suitable  for  the  description  of  properties  at  the  behavioral  level, 
and  it  could  be  used  at  the  RTL-level  as  well. 
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Figure  1: 

Execution  flow  of  the  steps 
of  our  coverification 
approach 


pi  property  ( flo\v_input  -  fiow__output  *  sqr(2)  <=  0.5  )  end 

p2  property  if  ( temperature  -=  199)  ( current  <=  0.2 )  end 

p3  property  if  ( signal_l  ==  199)  not(  signal_2  =  134 )  end 

Figure  2: 

Example  of  properties 
described  in  CPL 

p4  property  switch  ( state ) 

{ 

case  SI  :  not(signal_ack  &  signal_start); 

case  S2  :  (redjight  &  ~(grecnjight  &  orangejight)); 

default :  (reg_R  =  1); 

} 

end 

concurrent  (  pi ,  p2); 

sequential  (p2,  p3); 

B.  Coverification  of  properties 

The  CPL  code  describing  the  properties  is  translated  to  a  subset  of  Java  code  strongly  thread- 
oriented.  The  properties  declared  as  concurrent  in  the  CPL  code  are  manipulated  each  one  as  a  thread; 
those  identified  as  sequential  are  translated  and  treated  as  one  thread.  The  rest  of  the  properties  that  are 
neither  parameters  of  “concurrent  instruction  nor  of  “sequential  instruction  are  put  in  one 
thread  and  validated  as  a  composed  property.  The  threads  obtained  after  translation  are  added  to  the 
system  implementation  in  order  to  perform  functional  simulation  of  the  whole  code.  For  the 
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implementation  of  the  HW/SW  system,  we  use  a  Java  unified  model  in  which  the  HW  part  is  seen  as  a 
set  of  registers  with  read/write  accesses.  The  draft  tool  of  this  coverification  process  is  depicted  on 
Figure  3. 


V.  Conclusions 

In  this  paper,  we  have  presented  a  tool  for  the  coverification  of  HW/SW  systems.  This  tool  is 
based  on  the  multithreading  concept  and  cosimulation  technology.  The  properties  of  a  given  HW/SW 
system  are  described  in  CPL,  a  simple  language  developed  specially  for  the  description  of 
coverification  properties.  The  whole  implementation  of  the  HW/SW  system  and  its  properties  appears 
as  a  set  of  communicating  threads.  As  perspective,  we  plan  to  complete  this  tool  and  make  it  fully 
automatic  in  the  sense  of  managing  properties  and  generating  efficient  test  vectors  for  each  property. 
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Abstract.  In  this  paper  we  present  new  features  for  an  improved 
partitioning  and  a  parallelising  and  optimising  hardware  compilation  integrated  in 
the  hardware/software  co-design  environment  DICE  (Darmstadt  Interactive 
Codesign  of  Embedded  Systems). 


I.  Introduction 

The  new  methods  introduced  in  this  contribution  are  a  part  of  the  DICE-ffamework  [6] 
for  hardware  /  software  co-design  (Figure  1).  The  system  to  be  designed  with  DICE  is 
specified  by  concurrent  VHDL  and  C  processes.  In  the  first  step  this  description  is  converted 
to  a  concurrent  control  data  flow  graph  (CCDFG).  Based  on  this  graph  a  semi-interactive 
approach  for  the  hierarchical  HW/SW  partitioning  with  heterogeneous  granularity  is  performed 
by  the  designer  using  the  HiPART  [6]  graphical  user  interface.  After  an  initial  partitioning  step 
the  system  can  be  repartitioned  and  the  hardware  and  software  parts  can  be  migrated  from  C  to 
VHDL  and  vice  versa.  In  the  next  step  the  communication  structure  between  the  hardware  and 
the  software  modules  is  synthesised,  i.e.  abstract  communication  operators  are  replaced  by 
connections  using  busses,  busses  with  shared  memory  or  FIFO  buffers  [5].  After 
communication  synthesis  all  constraints  are  checked.  If  some  are  hurt,  the  system  has  to  be 
repartitioned.  Otherwise  it  is  synthesised  and  compiled  to  the  final  architecture  (ASIC’s,  DSP’s 
and  microcontrollers)  or  to  the  DICE  rapid  prototyping  platform  (REPLICA)  [7]. 

In  the  first  part  of  this  paper  the  integration  of  an  optimisation  algorithm  based  on  tabu 
search  [1],  [2],  [3]  into  the  HiPART  system  is  presented.  First  the  simulated  annealing 
algorithm  has  been  implemented  for  partitioning,  but  this  was  very  time  consuming.  In 
comparison  the  tabu  search  algorithm  has  the  advantage  that  it  is  deterministic  and  that  it  is 
more  appropriate  for  the  class  partitioning  problems  [2]  within  DICE,  i.e.  that  the  achieved 
solutions  are  nearly  of  the  same  quality,  but  the  computation  time  is  much  shorter. 

The  second  part  focuses  on  a  new  hardware  compilation  method  based  on  high-level 
loop  transformations.  These  code  optimisations  are  transferred  and  adapted  from  parallelising 
compilers  into  the  synthesis  process  of  systems  with  a  generic  hardware  architecture.  Since 
many  algorithms  spend  about  three  quarters  of  the  total  execution  time  in  loops,  an  optimal 
trade-off  between  the  execution  time  and  the  area  for  the  hardware  has  to  be  found.  Finally, 
the  optimised  loop  nest  is  mapped  onto  a  generic  hardware  architecture. 
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Fig.  1.  Design  Flow  in  the  DICE  Environment 


II.  HiPART:  Fast  Hardware  /  Software  Partitioning 

HiPART  is  a  new  hierarchical  hardware  /  software  partitioning  approach.  First  of  all  the 
initial  system  (co-)  specification  (VHDL/C)  is  converted  to  a  CCDFG  [6].  In  the  pre-clustenng 
step  the  granularity  of  the  partitioning  is  fixed.  Since  the  complexity  of  single  operations  or 
basic  blocks  may  vary  in  a  wide  range,  a  heterogeneous  granularity  concept  has  been  realised. 
In  order  to  reduce  the  complexity  of  the  optimisation  problem,  the  number  of  clusters  is 
reduced  to  an  user  specified  number  in  a  subsequent  automated  clustering  step.  During 
partitioning,  the  graph  and  resulting  partitions  can  be  visualised  with  a  graphical  user  interface, 
which  enables  the  designer  to  control  the  design  process  interactively.  In  the  main  partitioning 
step  the  assignment  of  the  tasks  to  the  partitions  is  improved  in  such  a  way,  that  a  defined 
parametrised  cost  function  is  minimised.  Because  of  the  solution  space  reduction  in  the 
clustering  step  it  is  not  guaranteed  that  the  optimisation  algorithm  will  find  the  global  optimum. 
Therefore  the  clusters  are  resolved  and  post-partitioned  during  the  iterative  refinement. 


HI.  Adapted  Tabu  Search  Algorithm  for  fast  Partitioning  with  HiPART 

The  task  of  the  HiPART  module  is  to  find  a  partitioning  such  that  a  given  user- 
parametrisable  cost  function  is  minimised  and  specified  constraints  are  met.  In  order  to  save 
computation  time  the  cost  difference  resulting  from  a  movement  of  operations  between  two 
partitions  is  computed  only.  In  this  context  the  application  of  genetic  algorithms  would  be  very 
inefficient,  since  they  contain  operations  (e.g.  the  crossover  operation)  which  require  a 
complete  recomputation  of  the  cost  function.  Since  simulated  annealing  is  able  to  find  a  global 
optimum  if  the  cooling  schedule  is  handled  properly,  this  algorithm  is  an  appropriate  approach 
for  solving  the  partitioning  problem.  Furthermore  it  may  be  used  to  evaluate  the  quality  of  the 
results  of  other  partitioning  heuristics.  Because  of  the  slow  temperature  decrease  and  a 
required  minimum  length  of  Markov  chains,  the  optimisation  process  is  very  time  consuming. 
Therefore  additionally,  a  tabu  search  algorithm  [1],  [2]  has  been  implemented  to  solve  the 
partitioning  problem.  In  order  to  avoid  the  tabu  search  algorithm  getting  stuck  at  a  local 
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optimum,  a  deterministic  diversification  method  has  been  integrated.  The  number  of  iteration 
steps  is  counted.  If  there  is  no  improvement  in  comparison  to  the  best  found  solution,  after  an 
user-defined  number  of  iteration  steps  a  diversification  mechanism  is  applied.  During 
diversification  operations  which  have  not  been  touched  for  a  long  time  are  moved  in  order  to 
enter  a  different  area  in  the  search  space.  The  number  of  diversification  steps  is  chosen  such 
that  each  task  is  moved  at  least  once.  The  algorithm  terminates,  if  during  a  predefined  amount 
of  optimisation/diversification  cycles  no  improvement  of  the  best  solution  is  achieved.  Tabu 
search  is  well  suited  to  replace  the  initially  applied  simulated  annealing  algorithm,  since  it  also 
requires  incremental  cost  updates  only.  If  the  tabu  search  algorithm  is  handled  properly  (tabu 
length,  tabu  exceptions,  diversification),  results  with  the  same  quality  as  by  application  of 
simulated  anneling  can  be  achieved.  By  this  alternative  approach  it  was  possible  to  reduce  the 
computation  time  for  a  problem  with  887  partitioning  nodes  from  an  average  time  of  150 
minutes  with  simulated  annealing  to  about  10  minutes  [3]. 

IV.  Hardware  Compilation  Techniques  and  Mapping  onto  Hardware 

This  new  module  enables  the  designer  to  optimise  the  data  flow  intensive  part  of  a  hardware  / 
software  system.  Since  this  part  is  very  computation  intensive  it  will  be  realised  in  hardware 
and  it  is  assumed  that  the  specification  is  given  in  VHDL.  Loops  are  the  most  computation 
intensive  parts  of  an  algorithm.  Therefore  the  task  of  this  module  consists  of  mapping  a  perfect 
loop  nest  to  an  optimised  parallelised  hardware  architecture,  i.e.  in  finding  an  optimal  trade-off 
between  time  and  area.  Towards  this,  the  index  space  and  the  index  functions  are  extracted 
from  the  loop  nest.  From  this  basis  the  dependences  between  the  array  references,  the 
dependence  vectors  and  the  dependence  matrix  are  determined.  In  the  next  step  the  sequential 
loop  nest  is  transformed  to  be  executed  in  parallel.  This  can  be  done  by  an  unimodular 
transformation  which  converts  it  first  to  the  maximal  degree  of  parallelism  [4],  [8]  and  then  by 
a  coordinate  transformation  of  the  index  space  to  a  loop  of  a  smaller  degree  of  parallelism. 
This  method  has  the  advantage  that  it  is  unimodular  but  the  loop  cannot  be  mapped  to  all 
degrees  of  parallelism.  Therefore  we  propose  to  generalise  the  wavefront  method  [9],  i.e.  that 
not  only  all  those  index  points  are  scheduled  to  one  time  step  which  are  on  one  wavefront  but 
which  are  in  the  interval  between  two  wavefronts.  The  advantage  of  this  method  is  that  a 
mapping  to  all  degrees  of  parallelism  is  possible.  Now  the  relation  between  the  number  of  time 
steps  and  the  number  of  processing  elements  corresponding  to  the  degree  of  parallelism  can  be 
fixed  and  the  optimal  trade-off  has  to  be  found  in  dependence  of  the  application  area  of  the 
circuit.  In  order  to  find  this  optimum  the  hardware  architecture  which  the  parallelised  code  is 
mapped  onto  has  to  be  considered  (Figure  2).  Since  the  memory  has  to  be  accessed  in  parallel 
it  is  subdivided  into  subRAMs  and  the  data  items  are  dynamically  assigned  to  the  processing 
elements  by  an  array  of  multiplexers.  In  the  next  step  the  data  are  processed  and  in  the  third 
step  they  are  written  back  to  the  subRAMs.  With  the  help  of  this  approach  it  became  possible 
to  map  a  perfect  loop  nest  to  an  optimised  and  parallelised  hardware  architecture. 

V.  Summary 

In  this  paper  we  have  presented  a  fast  approach  for  hardware  /  software  partitioning  and 
a  method  which  maps  a  perfect  loop  nest  onto  an  optimised  and  parallelised  hardware 
architecture. 
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Fig.  2.  Final  System 
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Abstract.  The  purpose  of  this  paper  is  to  present  how  to  design  the 
memory  points  of  digital  integrated  circuits  dedicated  to  operate  in  space.  The 
proposed  methodology  relies  on  the  use  of  an  integrated  Built-In  Self  Protection 
(BISP).  This  approach  can  prevent  the  loss  of  information  due  to  heavy  ions  hits 
and  save  performances. 


I.  Introduction 

The  Integrated  Circuits  (IC)  operating  in  space  and  high  atmosphere  are  exposed  to 
particles.  Some  of  them  like  heavy  ions  are  able  to  induce  an  electrical  upset  in  CMOS  digital 
circuits  [1].  In  combinational  logic,  this  transient  phenomenon  can  cause  a  false  bit  storage  in 
sequential  logic.  Memory  elements  are  responsible  of  this  soft  error  [2-4].  The  upset 
susceptibility  of  digital  circuits  increasing  with  the  number  of  memory  elements,  memory 
hardening  is  the  key  for  digital  IC  hardening. 

Moreover,  performances,  size  reduction,  low  power  and  low  cost  stay  always  essential 
for  integrated  circuits  despite  the  perturbing  phenomenon.  Therefore,  the  hardening  by  acting 
at  the  design-level  even  on  non-hardened  commercial  CMOS  processes,  received  attention  in 
the  past,  notably  for  memory  elements  [5-7].  But,  no  solution  fulfills  all  requirements.  The 
drawbacks  are:  high  complexity  or/and  large  area  or/and  writing  time  increase  or/and  static 
power  consumption. 

To  overcome  the  impediments  related  to  the  high  energy  ion  radiative  environment,  we 
propose  and  put  in  practice  a  hardening  methodology  based  on  design  available  for  a  standard 
technology  (Bulk-CMOS). 

II.  Phenomenon  description 

When  an  heavy  ion  hits  a  reverse  biased  junction,  charges  are  generated  along  the  ion 
track  and  collected.  A  transient  current  pulse  is  then  induced.  Pointing  out  its  random  nature, 
this  effect  is  called  Single  Event  Upset  (SEU)  for  discerning  it  of  the  total  ionizing  dose, 
another  perturbing  effect.  In  the  digital  circuits,  the  SEU  appears  as  an  uncontrolled  logical 
level  upset.  If  the  phenomenon  happens  in  a  latch  (Fig.  1)  then  the  storage  of  an  erroneous  bit 
becomes  possible  due  to  the  loop  structure  of  the  cell. 

The  sensitive  MOS  areas  are  sketched  in  Fig.  1  for  Q=N0=0  and  Qb=l.  In  these 
conditions,  the  upsets  are  possible  uniquely  at  some  junctions  of  the  (m2,m4,mtg2,mtg4) 
transistors.  In  the  alternate  conditions,  (Q”N0=1  and  Qb=0),  it  is  the  turn  to  some  junctions  of 
(ml,m3,mtgl,mtg3)  to  become  sensitive. 
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A  HSPICE  done  using  a  current  pulse  generator  [5,7]  at  the  node  NO,  allows  to  observe 
the  electrical  response  (Fig.  2)  and  illustrates  how  a  change  in  the  bit  stored  can  happen. 


Fig.  1.  Standard  Latch  and  the  sensitive 
areas  pointed  out  for  Q  at  the  low  level 


Current  pulse  at  node  NO 


Fig.  2.  Post-layout  Spice  simulation  of  a 
SEU  effect  at  the  node  NO  showing  the 
standard  latch  sensitivity  (current  pulse: 
2  mA  amplitude,  200  ps  duration) 


When  an  upset  occurs  at  the  node  NO,  its  voltage  turns  from  a  low  level  onto  a  high 
level.  Consequently,  the  upset  propagates  through  the  inverters  (ml, m2)  and  (m4,m3) 
implying  Qb=0,  and  Q=l.  Then  a  false  value  is  then  memorized.  We  note  that  only  a  0.4pC 
charge  generated  in  the  transistor  mtg2  is  sufficient  to  induce  an  error  (Fig.  2). 

Recently,  another  kind  of  perturbing  event  has  been  evidenced  on  latch  structure  [4]:  it 
is  an  upset  not  generated  in  the  cell  itself  but  occurring  at  the  Dlatch  input.  In  this  condition, 
the  transient  pulse  is  normally  seen  by  the  Dlatch  as  a  change  in  the  input  logic  value.  If  the 
upset  arises  as  the  clock  is  switching  off,  the  writing  operation  can  be  jeopardized.  During  a 
time  interval  called  the  "window  of  vulnerability",  the  cell  can  store  a  false  bit.  This  effect 
arising  during  a  transition  is  specified  as  "dynamic  SEU".  On  the  contrary,  the  occurrence  of 
Fig.  2  -  for  which  the  pulse  surges  in  the  Dlatch  structure  while  the  cell  is  working  as  an 
isolated  memory  point  -  will  be  referred  as  "static  SEU". 

III.  Design  methodology  improving  SEU  hardening 

Our  design  methodology  is  illustrated  in  the  Fig.  3.  The  issue  implies  four  criteria,  (i) 
Only  a  reverse  biased  junction  can  be  upset.  As  a  consequence,  nMOS  storing  a  0  and  pMOS 
storing  a  1  have  their  value  preserved,  (ii)  The  writing  operation  of  the  latch  must  be  done 
with  two  signals,  D  and  Db.  This  introduces  a  necessary  redundancy,  (iii)  Resistors  permit  a 
RC  filtering  of  upset.  Therefore,  upset  cannot  propagate,  (iiii)  Finally,  inverters  are  replaced 
by  tri-state  inverters  able  to  put  in  a  high  impedance  state  Q  and  Qb  during  upset. 

To  prevent  possible  upsets  on  the  Q  or  Qb  nodes,  the  RC  filtering  inhibit  propagation  in 
the  loop  (Fig.  3).  For  an  upset  generated  at  the  nodes  G  or  Gb,  the  lost  of  information  is 
prevented  by  putting  Qb  or  Q  in  a  high  impedance  state  during  the  disturbance. 
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Fig.  3.  Design  methodology  used 

to  improve  latch  SEU  hardening  Fi8-  4-  Layout  of  a  hardened  latch 


If  one  of  the  two  latch  inputs  is  upset  during  a  clock  transition,  a  dynamic  SEU  is 
transmitted.  The  erroneous  value  has  to  propagate  through  the  inverter  and  the  resistor  of  the 


latch.  The  delay  necessary  to  this  propagation 
produces  a  veiy  effective  hardening. 

IV.  Results 
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Fig.  5.  Cross  Sections  of  standard 
and  hardened  latches 


combined  to  the  presence  of  the  dual  input 


Fig.  6.  Rate  of  reduction  of  the  cross- 
section  versus  Linear  Energy  Transfer 


A  hardened  latch  based  on  the  previous  methodology  [7]  (Fig.  4)  and  a  standard  latch 
have  been  implemented  in  a  test  circuit  (0.7  pm  bulk-CMOS  technology).  The  results  of  the 
irradiation  performed  at  the  Cyclotron  of  the  Lawrence  Berkeley  Laboratory  are  given  in  Fig. 
5  where  the  SEU  cross  sections  are  the  errors  number  divided  by  the  particles  fluence 
(incident  ions  by  cm2).  The  rate  of  decrease  in  the  cross  section  with  respect  to  the  standard 
latch  is  shown  Fig.  6.  The  Linear  Energy  Transfer  (LET)  measures  the  particles  energy 
efficiency. 


92 


These  figures  clearly  show  that  up  to  a  (LET)  of  30  Mev.cm2/mg,  only  the  standard 
latch  makes  errors.  Above  this  LET  threshold,  the  soft  errors  are  still  considerably  reduced, 
but  multiple  event  effects  can  appear. 


Standard  Latch 

Hardened  Latch 

area 

600  pm2 

790  pm2 

writing  time 

600  ps 

710  ps 

sensitive  junction  area 

106  pm2 

91  pm2 

Hardening 

un-hardened 

SEU  hardened 

Tab.  I.  Comparative  characteristics 


V.  Conclusion 

We  have  exposed  a  design  methodology  devoted  to  memory  points,  able  to  introduce  a 
Built-in  Self  Protection  (BISP)  in  SRAM,  Dlatches,  registers...  without  any  technology 
artefact.  A  circuit  has  been  implemented  to  put  in  practice  and  validate  these  new  concepts  in 
bulk  commercial  technology.  The  irradiation  results  show  a  quite  significant  improvement  of 
the  SEU  hardening  capabilities.  More,  the  electrical  simulations  performed  prove  that  writing 
speed  (18%  decrease)  and  consumption  of  the  tested  cell  are  preserved.  The  cost  in  silicon 
area  is  only  a  30%  loss  compared  to  the  two  to  three  hundred  percents  for  a  redundant  system. 
The  structure  came  with  a  surprising  14%  reduction  of  the  sensitive  area,  confirming  the 
design  feasibility  of  more  safe  devices. 
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Abstract.  Today’s  multimedia  applications  represent  complex  systems  im¬ 
plemented  on  a  single  piece  of  the  silicon.  Industrial  experiments  show  that 
design  of  such  applications  has  to  start  at  system-level  in  order  to  fulfill  in¬ 
creasing  requirements  for  area  and  power  reduction.  Typically,  reduction  of 
area  and  power  of  the  chip  includes  a  number  of  sophisticated,  but  often  man¬ 
ual  optimisations  applied  on  the  initial  specification.  Those  optimisations  are 
very  error-prone.  In  this  paper,  we  propose  a  system-level  methodology  for 
functional  verification  of  loop  oriented  transformations  on  multimedia  appli¬ 
cations.  The  methodology  is  based  on  the  combination  of  two  complemen¬ 
tary  existing  techniques:  formal  verification  of  loop  transformations  and  SFG- 
Tracing. 

I.  Introduction 

One  of  the  dominant  requirements  for  current  system-on-a-chip  multimedia  applications  is  the 
need  for  low  power  consumption  and  reduction  of  the  area  of  the  chip.  Traditional  methods 
solve  this  problem  by  optimisations  at  lower  levels  of  design,  usually  exploiting  a  number  of 
circuit  techniques.  Today’s  multimedia  applications,  however,  represent  typical  data-dominated 
systems,  where  the  dominant  power  contribution  comes  from  the  data  transfers  between  the  dat¬ 
apath  and  the  hierarchical  memories.  The  late  low  level  optimisations  do  not  bring  any  signifi¬ 
cant  gain  in  terms  of  power,  without  first  considering  the  reduction  of  wasted  memory  transfers. 
This  fact  has  to  be  taken  into  account  already  at  the  early  stages  of  the  design  process  [10]. 
In  practice  this  means  that  many  important  decisions  have  to  be  taken  already  at  the  system 
(algorithmic)  level.  The  search  space  for  looking  for  the  promising  algorithm  descriptions  at 
this  level  is  very  broad.  In  order  to  avoid  the  selection  of  good  candidates  on  an  ad-hoc  basis 
with  the  expectation  that  their  final  implementation  will  satisfy  the  user  requirements,  novel 
methodologies  are  researched  (see  e.g.  [2])  that  allow  the  designer  to  meet  the  power  and  area 
constraints  by  exploring  the  initial  algorithmic  description.  Code  transformation  techniques  are 
a  crucial  part  of  such  methodologies.  They  allow  to  manipulate  the  initial  descriptions  in  order 
to  reduce  costly  memory  transfers  [6]. 

Loop  transformations  applied  on  a  code  described  on  a  number  of  pages  can  be  very  error- 
prone,  however,  especially  when  applied  manually.  In  order  to  guarantee  the  correctness  of  the 
transformed  code,  the  optimised  descriptions  should  be  always  compared  with  the  initial  ones. 

This  paper  propose  complete  functional  verification  methodology  for  multi-media  applica¬ 
tions  at  system-level  based  on  combination  of  the  two  complementaty  verification  techniques: 
verification  of  loop  transformations  and  SFG-Tracing.  We  first  review  and  compare  both  verifi¬ 
cation  techniques  in  order  to  clarify  their  principles,  advantages  and  shortcomings.  Then,  their 
combination  will  be  explained  in  several  steps.  Concluding  remarks  are  presented  at  the  end  of 
the  paper. 

n.  Methodology  for  Verification  of  Loop  Transformations 

For  a  large  class  of  system-level  transformations  in  data-dominated  signal  processing  applica¬ 
tions,  a  technique  for  the  behavioural  equivalence  of  any  two  high-level  descriptions  has  been 
developed  at  IMEC  [9].  It  is  based  on  a  geometrical  domain  modeling  for  each  statement  in  the 
description. 

A  formal  model  [9]  is  proposed  with  aim  to  model  the  loop  constructs  and  indexed  signals  in  a 
very  efficient  way.  Each  statement  in  the  description  is  defined  by  a  predicate  (P.R),  consisting 
of  a  relation  (R)  between  the  signals  involved  and  precondition  (P)  under  which  the  relation 
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holds.  The  relation  [7,  8]  is  a  unique  representation  of  the  statement,  made  independent  from 
the  original  index  names  in  the  description.  The  precondition  is  derived  from  the  manifest 
constraints  on  the  indices  of  the  loop  bounds  surrounding  the  statement.  It  defines  the  index 
domain  for  the  relation  (statement).  The  index  domain  is  then  represented  in  a  geometrical  way. 
The  complete  description  is  modelled  by  the  conjunction  of  all  the  predicates  representing  the 
statements  within  description. 

The  basic  loop  verification  technique  consists  of  three  steps  [9]: 

1.  Joining  the  relations.  In  the  first  step,  die  corresponding  statements  are  identified  and 
their  preconditions  are  merged  in  both  initial  and  optimised  description.  Correspond¬ 
ing  statements  are  the  statements  that  have  been  distributed  over  the  loops  as  a  result  of 
optimisations  like  loop  unrolling  or  loop  split.  Their  identification  is  based  on  simple 
syntactical  pattern  matching  of  the  signal  names  in  the  statement.  In  case  of  equal  names 
the  relations  of  the  statements  are  equal.  In  case  of  equal  relations,  the  two  constrained 
relations  are  joined  to  one  new  constrained  relation  by  merging  the  preconditions  of  the 
two  relations. 

2.  Reduction  of  the  domains.  In  this  step  the  merged  preconditions  are  rewritten  into 
canonic  form,  independent  on  original  iterator  names.  Their  domain  representations  are 
then  reduced  to  a  definition  where  only  free  variables  remain. 

3.  Computation  of  the  difference  of  the  domains.  In  order  to  prove  that  the  two  descrip- 

'  tions  have  the  same  behaviour,  the  sets  of  predicates,  one  for  the  initial  and  other  for 

optimised  descriptions,  have  to  be  proven  equal.  The  corresponding  predicates  between 
the  two  descriptions  are  find  again  by  pattern  matching  of  the  signal  names.  The  equiva¬ 
lence  of  their  preconditions  is  done  by  computing  the  difference  of  the  domains  defined 
by  them.  The  two  related  preconditions  must  define  the  same  domain.  If  the  result  of  the 
difference  is  an  empty  domain,  the  preconditions  are  same  and  the  description  is  correct, 
otherwise  an  error  was  introduced  during  the  transformation  phase. 

DI.  SFG  Tracing  Methodology 

SFG- Tracing  [3]  is  a  formal  verification  methodology  that  enables  to  check  the  correctness 
between  system-level  behavioural  specifications  and  low  level  implementations.  The  basic  idea 
of  this  methodology  is  to  divide  the  initial  algorithmic  specification  represented  in  Signal  Row 
Graph  (SFG)  into  a  set  of  partial  signal  flow  graphs  (pSFG’s).  The  partitioning  to  pSFG’s 
enables  then  to  overcome  the  complexity  problem  of  overall  verification.  Each  of  these  is  traced 
in  the  implementation  and  checked  for  correctness.  In  order  to  make  the  verification  feasible  and 
efficient,  a  set  of  correspondence  relations  must  be  known  between  the  high  level  specification 
and  lower  level  implementation  for  the  verification  technique  to  work.  These  relations  are 
called  reference  signals  and  mapping  functions.  Reference  signals  are  corresponding  to  input 
and  output  signals  of  each  pSFG  and  signals  at  a  certain  time  and  space  in  the  implementation. 
Mapping  functions  describe  the  behavioural  correspondence  of  the  reference  signals  between 
implementation  and  specification  in  space  and  in  time,  under  specific  conditions 

The  bit-true  implemented  behaviour  of  each  of  the  pSFG’s  is  compared  with  the  expected 
one  by  symbolic  simulation  making  use  of  Binary  Decision  Diagrams  (BDD)  [1].  The  sym¬ 
bolic  simulation  is  used  to  compare  if  the  Boolean  expression  obtained  by  simulation  of  the 
implemented  circuit  represents  the  same  BDD  as  the  one  that  are  derived  from  tire  specifica¬ 
tion. 

IV.  Comparison  of  Verification  Methodologies 

As  mentioned  above,  SFG  Tracing  is  the  general  verification  methodology  that  allows  to  verify 
the  behavioural  correctness  between  specification  and  implementation  at  different  abstraction 
levels  SFG  Tracing  has  been  mainly  used  for  verification  between  the  register  transfer  and 
transistor  level  verification  and  then  extended  towards  algorithmic  level  [4].  At  the  algonth- 
mic  level  also  verification  of  loop  constructs  by  means  of  mathematical  induction  have  been 
proposed,  but  this  approach  is  limited  because  the  induction  based  proofs  of  loop  constructs 
impose  restrictions  on  the  maximum  number  of  nested  loops.  The  number  of  proofs  needed  to 
prove  the  induction  case  grow  exponentially  with  the  nesting  depth  level  of  the  loop.  Moreover, 
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the  technique  was  only  suitable  for  same  number  of  loops  in  two  compared  descriptions,  having 
loops  with  regular  structure  and  equal  ordering  of  indexed  signals.  In  case  of  loop  transforma¬ 
tions,  loop  unrolling  was  necessary.  This  is  not  acceptable  for  data- dominated  applications  we 
target  here,  in  which  a  number  of  multidimensional  signals  are  enclosed  in  differently  nested 
loops. 

On  the  other  hand,  the  methodology  for  loop  transformation  verification  has  been  used  for  veri¬ 
fication  of  two  algorithmic  descriptions,  but  arithmetic  and  related  control  in  the  datapaths  were 
not  checked,  because  the  formal  model  was  not  intended  to  express  their  behaviour.  However, 
the  shortcomings  of  both  methods  can  be  efficiently  eliminated  with  their  combination,  thus 
allowing  the  complete  verification  of  system  level  optimisations. 

V.  Combination  of  Sfg-Tracing  and  Loop  Transformation  Verification 

The  proposed  overall  methodology  for  system-level  verification  consists  of  several  steps.  Before 
the  actual  verification,  specification  has  to  be  hierarchically  partitioned  into  the  part  that  will  be 
affected  by  the  code  transformations,  verified  by  the  first  technique,  and  the  part  which  includes 
the  scalar  arithmetic  and  control  operations,  where  SFG  Tracing  technique  will  be  applied.  This 
partitioning  has  to  be  preceded  by  careful  code  exploration.  These  steps,  together  with  a  short 
explanation  of  memory  optimisation  steps  will  be  described  in  this  section. 

A.  Code  exploration 

During  the  design  of  complex  hardware  and  software  (or  mixed)  systems,  the  designer  has 
a  number  of  possibilities  to  modify  the  initial  design  specification  in  order  to  obtain  a  better 
system  implementation.  For  complex  multimedia  applications,  exploration  of  all  possibilities 
is  not  feasible.  The  designer  has  to  concentrate  only  on  the  optimisations  of  the  most  relevant 
part  of  the  code  and  suppress  the  parts  that  are  not  important.  This  can  be  achieved  by  pruning 
of  the  original  specification  where  only  the  most  promising  parts  for  optimisation  are  selected 
[2].  From  the  point  of  view  of  memory  optimisations,  these  parts  represents  code  describing  the 
communications  between  the  data-path  and  memory,  which  are  recognized  after  a  high-level 
estimation  of  the  number  of  main  memory  accesses  (profiling),  possibly  making  use  of  efficient 
access  counting  tools.  The  rest  of  the  code  which  is  not  the  object  of  the  system-level  loop 
optimisations  represents  the  data-path  and  control. 

B .  Hierarchical  specification  modeling 

As  mentioned  above,  hierarchical  partitioning  of  the  specification  is  very  important  to  model 
the  separation  of  the  optimisation  related  part  and  the  data-path  related  part  of  the  initial  speci¬ 
fication.  In  particular,  the  specification  has  to  be  split  into  three  layers: 

layer  1:  a  procedural  process  control  flow  top  layer  which  is  not  of  real  interest  for  optimi¬ 
sations,  except  for  the  potential  sequence/timing  constraints  which  are  imposed  on  the 
memory  transfers. 

layer  2:  a  middle  layer  which  represents  all  the  relevant  information  for  the  optimisation  tasks. 
This  layer  contains  multidimensional  data  flow  to  be  stored/transferred  in  background 
memory. 

layer  3:  a  bottom  layer  of  local  data-path/control  functions  which  contains  scalar  data  and  lo¬ 
cal  control  flow.  This  layer  hides  all  the  foreground  and  arithmetic/control  issues.  More¬ 
over,  it  is  also  used  to  hide  some  of  the  undesired  constructs  which  are  not  of  real  interest 
to  the  system-level  optimisation  tasks. 

Within  this  model,  all  arithmetic,  scalars  and  code  constructs  that  are  not  important  from  the 
point  of  view  of  optimisations  at  system-level  are  hidden  in  the  lowest  level  functions.  Those 
are  called  from  higher  levels  in  the  middle  layer  where  the  M-D  data  flow  is  expressed  by  means 
of  loops  and  indexed  signals.  Only  this  middle  layer  part  of  code  is  targeted  to  the  optimisation 
considered  here  because  only  this  contains  the  relevant  information. 

C.  Optimisation  Steps 

After  layering,  the  optimisation  steps  of  the  data  transfer  and  storage  exploration  (DTSE) 
methodology  [2]  can  be  applied.  These  represent  a  hierarchical  combination  of  procedures, 
each  of  them  addressing  a  different  task  in  the  optimisation  process.  More  of  the  steps  involve 
global,  complex  loop  manipulations.  Also  a  parallelizing  compiler  approach  could  be  applied. 
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D.  Verification  e  , 

Exploiting  the  leveling  specification  model  as  described  in  section  B.,  the  input  specification 
can  be  divided  into  three  different  layers.  This  allows  the  separation  of  the  code  to  the  DTbh 
related  and  data-path  related  partitions.  The  descriptions  covered  at  the  different  levels  are  then 
verified  by  the  appropriate  technique  as  described  in  the  following  procedure: 

1.  The  first  layer  is  inlined  with  the  second  layer  leading  to  a  single  layer  description.  Such 
a  joined  layer  is  then  verified  fully  by  the  method  for  verification  of  loop  transformations 
as  described  in  section  II..  We  exploit  our  novel  extensions  of  the  method  [5]  that  can  sig¬ 
nificantly  reduce  the  computation  time  requirements  in  order  to  cope  with  initial  complex 
multi-media  specifications. 

2.  Functions  in  the  third  layer  are  matched  in  the  two  compared  specifications  and  veri¬ 
fied  by  SFG  Tracing ,  because  the  problematic  (nested)  loop  constructs  are  not  present  in 
there.  Because  both  the  specification  and  implementation  lies  at  the  same  (algorithmic) 
abstraction  level,  the  mapping  functions  can  be  found  by  signal  matching  on  the  signal 
name's  on  the  left  and  right  sides  of  the  statement. 

If  the  functions  in  the  layer-3  are  equivalent,  then  the  verification  can  concentrate  on  layer-2 
and  layer-1,  . otherwise  the  errors  in  these  layer-3  functions  have  to  be  solved  before  proceeding 
to  verification  of  layer-2. 


VI.  Conclusions 

Wc 'propose  methodology  for  the  formal  verification  of  data  dominated  multimedia  applications 
at  system-level.  It  is  based  on  the  combination  of  two  complementary  existing  techniques, 
verification  of  system  level  loop  transformations  and  the  SFG  Tracing  technique  for  verification 
of  correctness  of  arithmetic  constructs  and  related  control  flow.  This  allows  exploitation  of 
benefits  of  both  techniques  to  obtain  complete  formal  verification  technique  at  the  level  which 
has  not  been  fully  covered  by  other  validation  techniques.  Although  it  is  not  illustrated  in  this 
paper,  the  verification  technique  has  been  successfully  applied  to  real-life  designs  showing  its 
ability  to  verify  the  complex  transformations  applied  on  multimedia  applications. 
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Analog  (S)witchcraft,  or  How  to  Perform  Accurate  and  Linear 
Data  Conversion  Using  Inaccurate  Nonlinear  Elements 

Gabor  C.  Temes,  Un-Ku  Moon  and  Jesper  Steensgaard 
Oregon  State  University \  Corvallis,  OR  97331,  USA 

Abstract.  This  paper  provides  a  tutorial  overview  of  some  recently  developed 
methods  for  enhancing  the  accuracy  and  linearity  of  data  converters  (analog-to- 
digital  as  well  as  digital-to-analog)  by  introducing  auxiliary  digital  circuitry  which 
calibrates,  cancels  and/or  corrects  the  errors  introduced  by  the  unavoidable  inaccu¬ 
racy  of  the  analog  components  used  in  the  conversion.  Simple  but  practical  exam¬ 
ples  are  used  to  illustrate  the  various  improvement  techniques. 

I.  Introduction 

Conventional  high-accuracy  data  converters  require  extreme  accuracy  in  the  matching  of 
analog  components,  which  cannot  be  achieved  in  an  integrated  circuit.  In  this  paper,  three 
strategies:  analog  correction,  error  cancellation  and  spectral  shaping  are  described  for  achieving 
accurate  dynamic  component  matching.  Switched-capacitor  (SC)  DACs  will  be  used  to  illustrate 
the  techniques. 

II.  Analog  Correction  Techniques 

Fig.  1  shows  the  conceptual  diagram  of  a  DAC  constructed  using  SC  circuitty.  The  operation 
of  the  circuit  is  as  follows.  The  input  binary  word  is  converted  into  a  thermometer  code  with  bits 
xx,  x2>  .  . xm  such  that  if  the  integer  value  of  the  input  word  is  m ,  the  bits  x\t  £2,  •  •  arc 
1,  and  the  rest  are  0.  During  the  reset  phase  {(pi  =  1),  the  feedback  capacitor  Cf  is  discharged 
and  all  input  capacitors  are  charged  to  the  reference  voltage  VTef.  Next,  during  the  conversion 
phase  (< p2  —  1),  the  first  m  input  capacitors  are  discharged  into  Cf ,  resulting  in  an  output  voltage 


Figure  1:  A  switched-capacitor  DAC 

Analog  correction  of  the  matching  error  may  be  achieved  by  using  the  system  shown  in 
Fig.  2,  where  each  capacitor  C{  is  split  into  a  coarse  and  a  fine  part,  Cci  and  Cfi,  respectively, 
and  a  separate  buffered  reference  voltage  Vrefi  is  introduced  for  each  the  Cfi  [1].  When  (p 2  is 
high,  capacitors  C &  and  Cfi  arc  discharged  into  Cf  if  =  1,  otherwise  they  hold  their  charges. 
A  calibration  stage,  consisting  of  a  transconductor  and  a  reference  capacitor  Cref ,  is  used  to 
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readjust  the  Vrefi  sequentially  when  the  2-th  calibration  clock  phase  fa  is  high,  so  as  to  make 
the  combined  charges  stored  in  Cci  and  Cfi  equal  to  CrefVref.  To  replace  the  capacitor  being 
calibrated,  an  extra  set  of  Cc  and  Cf  is  also  needed.  The  resulting  conversion  accuracy  can  then 
be  as  high  as  15  bits.  The  process  is  similar  to  that  proposed  earlier  for  current-mode  DACs  by 


Groenewald  et  al.  [2], 


HI.  Error  Cancellation  Techniques 

'  .  Error  cancellation  techniques  are  similar  to  analog  correction  in  the  sense  that  analog  quan¬ 
tities  (charge,  voltage,  etc.)  are  manipulated  under  digital  control  to  achieve  error  cancellation. 

Consider  the  DAC  stage  containing  two  equal-valued  capacitors,  shown  in  Fig.  3  [3].  Its 
operation  under  ideal  conditions  is  as  follows.  The  digital  input  words  are  entered  serially,  with 
the  least  significant  bit  (LSB)  first.  Before  each  word  enters,  both  capacitors  are  discharged  by 
the  reset  switches.  Then,  when  (j>i  =  1,  Ci  is  charged  to  a  voltage  Vref  or  0,  depending  on 
the  LSB.  Next,  as  =  1,  C\  and  C2  share  charges.  Afterwards,  Ci  is  disconnected  from  C2, 
and  again  is  charged  to  Vref  or  0,  depending  on  the  value  of  the  second  LSB.  This  procedure  is 
repeated  for  each  bit,  until  the  MSB  has  been  processed.  At  this  point  the  charge  stored  in  both 
Ci  and  in  C2,  and  hence  the  voltage  across  them,  represents  the  converted  value  of  the  input 
digital  word.  x(n,k)  (j)1  <j>2 


In  practice,  the  capacitors  used  cannot  be  made  exactly  equal,  and  hence  the  conversion 
becomes  inaccurate.  This  introduces  a  deterministic  nonlinearity  into  the  process,  which  gives 
rise  to  harmonic  distortion.  We  can  quantify  the  imperfect  capacitor  matching  property  by 
defining  the  error  coefficient  S  =  ( Ci  —  C2)/(Ci  4-  C2)  for  the  nominally  matched  capacitors 
Ci  and  C2.  Analyzing  in  detail  the  charge  transfers  that  occur  for  the  N  clock  cycles,  an  explicit 
formula  can  be  found  for  the  error  err  of  the  final  DAC  output. 

A  simple  way  to  perform  capacitor  mismatch  error  cancellation  is  to  repeat  the  conversion 
for  the  same  input  word  with  the  roles  of  capacitors  Ci  and  C2  interchanged.  This  changes  the 
sign  of  6 ,  while  leaving  the  rest  of  the  formula  giving  err  unchanged.  Hence,  when  the  two 
outputs  obtained  in  the  two  conversions  for  the  same  input  word  are  added  together,  the  effect 
of  the  capacitor  mismatch  error  cancels.  Thus,  at  the  cost  of  doubling  the  conversion  time,  the 
accuracy  is  much  enhanced.  Capacitor  mismatch  error  cancellation  schemes  using  this  property 
were  suggested  for  a  number  of  different  architectures  [4]-[5]. 
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IV.  Spectral  Error  Shaping 

In  spectral  error  shaping,  the  error  signal  generated  by  the  mismatch  of  nominally  equal¬ 
valued  elements  gets  filtered  so  as  to  suppress  its  in-band  spectral  energy.  Since  the  filtering  is 
usually  only  first  or  second  order,  this  technique  will  only  be  effective  if  the  signal  band  occupies 
only  a  relatively  small  part  of  the  0  to  fa/ 2  range,  where  fs  is  the  sampling  frequency,  i.e.,  if 
the  oversampling  ratio  R  —  /5/(2/&)  is  much  greater  than  L  Here,  /*,  denotes  the  bandwidth 
of  the  signal  being  converted.  The  filtering  action  can  be  obtained  by  appropriately  choosing 
the  indices  of  the  matched  elements  participating  in  the  conversion  of  each  signal  sample.  One 
option  is  to  choose  the  capacitors  used  in  converting  each  input  sample  randomly,  rather  than 
deterministically  as  described  above  [6].  Now  the  error  will  be  in  general  different  each  time 
a  fixed  code  is  entered  into  the  DAC,  and  hence  the  matching  errors  introduce  random  noise, 
rather  than  distortion.  Thus,  using  this  strategy,  the  mismatch  error  is  converted  into  a  wide¬ 
band  noise,  only  a  fraction  of  which  falls  in  the  signal  band.  This  process  can  be  regarded  as 
zero-order  spectral  shaping. 

First-order  spectral  shaping  can  be  achieved  based  on  the  following  considerations.  Consider 
the  input/output  characteristics  of  an  ideal  SC  DAC  (Fig.  1).  It  is  possible  to  choose  the  elements 
'used  during  conversion  such  that  the  average  value  of  the  output  for  each  code  falls  on  a  straight 
line.  This  suppresses  harmonic  distortion,  and  eliminates  the  dc  error  for  any  input.  Specifically, 
if  all  input  elements  are  used  with  equal  frequency  for  each  code,  then  the  average  outputs  will 
fall  on  a  straight  line,  and  hence  the  element-value  errors  will  result  in  a  noise  with  a  zero  mean 
value. 'This  indicates  that  the  power  spectral  density  (PSD)  of  the  mismatch  noise  has  a  zero  at 
dc,  and  the  PSD  is  nonuniform.  This  process  thus  provides  a  first-order  mismatch-noise  shaping. 

There  exist  numerous  techniques  for  achieving  the  required  equal  average  usage  for  the  indi¬ 
vidual  capacitors.  In  one  (called  barrel  shifting  [7]),  the  capacitors  used  for  the  first  sample  with 
value  mi  are  Ci,  C2,  . . .,  Cm  1;  for  the  second  sample  with  value  m2,  the  set  C2,  C3,  . . C^+i 
is  used,  etc.,  and  the  selection  wraps  around  to  C\  once  the  last  of  the  C{  has  been  used.  Another 
averaging  technique  (called  individual  level  averaging  [8])  keeps  track  of  the  past  usage  of  each 
element  C*  for  each  input  code,  and  assigns  them  so  as  to  keep  the  average  usage  uniform.  Yet 
another  (named  data- weigh  ted  averaging  [9])  uses  the  set  Cu  Cg,  . .  Cm  1  for  the  first  sample, 
Cmi+i,  Crr ii+2,  •  •  Cmi+m2  for  the  second,  etc.,  with  wrap-around  to  Ci  after  the  last  C* 
(Cm)  has  been  used.  These  techniques  have  various  relative  advantages  and  disadvantages;  the 
barrel-shifting  method  is  simple  to  perform,  but  it  can  generate  undesirable  tones  in  the  pass- 
band  for  some  input  frequencies,  while  individual  level  averaging,  which  is  not  susceptible  to 
tone  generation,  requires  more  elaborate  digital  circuitry,  and  takes  longer  to  achieve  the  desired 
averaging.  Data-weighted  averaging  is  relatively  simple,  and  achieves  rapid  averaging  since  no 
element  will  be  used  twice  until  all  others  arc  used.  Other  techniques  have  also  been  proposed 
for  achieving  first-order  noise  shaping  [10]-[1 1]. 

To  achieve  higher-order  noise  shaping,  the  binary  logic  signal  x^n),  which  decides  whether 
or  not  Vi  will  contribute  to  Vout  in  the  rc-th  sampling  period,  can  be  forced  to  assume  the  form 

xi{n)  =  f(n)  +  h(n)  *  [ej(n)  -  r(n)]  (1) 

Here,  the  asterisk  denotes  the  discrete-time  convolution;  also,  f(n)  is  a  bounded  function,  in¬ 
dependent  of  i,  and  h(n)  is  the  impulse  response  of  the  desired  shaping  filter.  Finally,  the  ej(n) 
arc  pseudo-random  bounded  functions,  in  general  different  for  each  i.  Then,  the  output  error  in 
the  n-th  period  is 

err(n)  =  Xi(n)dVi  —  [f(n)  ~  h(n)  *  r(n)]  ^  dVi  4-  h(n )  *  ]T]  ei(n)dVi  (2) 
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where  dVi  is  the  output  error  introduced  by  the  i-th  mismatched  capacitor.  If  the  full-scale 
output  is  accepted  as  correct,  the  first  term  on  the  RHS  is  zero,  and  the  second  term  contains  the 
desired  filter  function.  Hence,  if  we  can  generate  a  set  of  binary  logic  sequences  Xi  such  that 
in  each  sampling  period  they  satisfy  eq.(l)  and  their  sum  equals  the  input  value  m,  the  error 
shaping  is  accomplished.  r(n)  e(n) 


Figure  4:  Digital  delta-sigma  loop  for  generating  the  Xi{n)  sequence  of  eq.(l) 

Consider  next  the  digital  delta-sigma  loop  shown  in  Fig.  4  [12].  Analysis  shows  that  its 
single-bit  output  sequence  Xi{n )  is  given  exactly  by  eq.(l),  if  the  truncation  error  of  the  com¬ 
parator  is  denoted  by  e*(n),  and  if  H{z)  is  the  z-transform  of  h(n).  Hence,  M  such  structures 
(one  for  each  capacitor  C{)  can  be  used  to  generate  the  Xi(n)  sequences  for  the  operation  of  the 
DAC. 

For  a  positive  integer  system,  the  common  input  /  (n)  of  the  loops  can  be  chosen  so  that  the 
input  of  the  truncation  block  in  one  of  the  loops  is  zero,  and  in  all  others  it  is  positive.  This  will 
minimize  the  signals  in  the  loops,  and  hence  helps  to  keep  their  operation  stable.  The  sequence 
r{n)  is  essentially  a  time-variable  threshold  for  the  comparators.  It  is  chosen  such  that  exactly 
m(n)  of  the  M  loops  have  outputs  Xi(n)  =  1  during  period  n. 


Figure  5:  Unshaped  (a)  and  shaped  (b)  output  spectra  of  the  two-capacitoi  DAC 


The  mismatch  error  shaping  process  can  be  applied  to  other  structures,  such  as  the  two- 
capacitor  serial  DAC  described  above  and  shown  in  Fig.  3  [5], [13].  Whereas  in  the  case  of  the 
M-element  DAC  of  Fig.  1  the  degree  of  freedom  which  allowed  spectral  error  shaping  without 
changing  the  signal  processing  function  was  the  arbitrary  choice  of  the  Ci  in  generating  the 
analog  output,  here  there  is  the  option  of  interchanging  die  roles  of  C\  and  C2  in  each  clock 
cycle  when  (j>i  =  1.  By  generating  a  binary  sequence  i(n,  k)  which  decides  the  role  of  the 
capacitors  during  the  conversion  of  the  fc-th  bit  of  the  ra-th  input  word  with  a  digital  delta-sigma 
loop,  a  filtered  mismatch  error  can  be  obtained.  Fig.  5  compares  the  unshaped  and  shaped 
output  spectra  of  the  DAC  for  a  sinewave  input  with  a  peak-to-peak  amplitude  of  Q.7VTef,  an 
oversampling  ratio  of  10  and  an  assumed  mismatch  of  8  =  0.1%.  A  third-order  noise  shaping 
and  dithering  was  used  in  the  loop  generating  t(n,  k).  The  unshaped  error  gives  a  S/THD  ratio 
of  only  70  dB;  the  S/(N+THD)  for  the  mismatch-shaped  spectrum  is  around  96  dB,  a  gain  of 
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26  dB.  Note  that  unlike  the  error-cancelling  scheme  discussed  earlier  for  this  structure,  mismatch 
shaping  does  not  double  (or  change  in  any  way)  the  conversion  time;  the  only  cost  is  the  added 
digital  circuitry,  which  is  insignificant. 

VI.  Conclusions 

In  this  tutorial  paper,  it  was  shown  that  very  high  accuracy  and  linearity  may  be  obtained 
in  data  conversion  even  when  using  inaccurate  analog  components,  by  introducing  additional 
digital  logic  which  takes  advantage  of  the  hidden  degrees  of  freedom  in  the  operation  of  the 
converter  circuit  to  achieve  cancellation,  calibration  or  frequency  shaping  of  the  error  introduced 
by  the  analog  imperfections.  This  enables  the  designer  of  mixed-mode  interface  circuits  to 
satisfy  the  increasing  demands  for  ever  faster  and  more  accurate  fully  integrated  data  converters. 
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Abstract.  An  experimental  demonstration  of  an  error  free  lOOGbit/s 
optical  time  division  multiplexing  (OTDM)  broadcast  star  computer  interconnect  is 
presented.  A  highly  scalable  novel  node  design  provides  rapid  inter-channel 
switching  capability  on  the  order  of  the  single  channel  bit  period  (1.6  ns). 

I.  Introduction 

Although  lightwave  technology  is  meeting  the  demand  for  point-to-point  and  long-haul 
transport  of  digital  information,  routing  packets  at  the  nodes  of  the  network  has  typically  been 
earned  out  using  electronically  switched  backplane  routers.  The  growing  capacity  on  the 
Internet  is  placing  an  ever  greater  demand  on  electronic  routing  technologies.  While  WDM 
can  support  large  aggregate  traffic  bandwidths,  it  is  difficult  to  perform  routing  functions 
which  may  involve  challenging  techniques  such  as  dense  wavelength  conversion.  Additionally, 
present  WDM  laser  and  filter  tuning  techniques  rely  upon  slow  technologies  which  increase  the 
channel  access  latency  and  reduce  the  effective  network  bandwidth. 

Recent  advances  in  optical  time  division  multiplexing  (OTDM)  have  proven  this 
technology’s  capability  to  handle  the  switching  and  routing  needs  for  future.  Channel  access  in 
OTDM  networks  is  achieved  by  using  time  slot  tuners  and  all-optical  demultiplexers.  Timing 
precision  of  less  than  lps  is  required  to  tune,  multiplex,  and  demultiplex  individual  channels 
within  the  OTDM  frame. 

The  computer  interconnect  we  are  constructing  is  based  upon  an  OTDM  broadcast  star 
architecture.  The  high-level  architecture  and  node  design  is  shown  in  Fig.  1.  Nodes  transmit 
information  at  a  slow  data  rate,  B,  by  modulating  picosecond  optical  pulses.  By  using  a 
scalable  time  slot  tuner,  the  pulse  is  appropriately  delayed  to  correspond  to  the  desired 
destination  time  slot.  Data  pulses  from  all  nodes  are  multiplexed  into  a  time  frame  with  an 
aggregate  bandwidth  of  A®,  where  N  is  the  number  of  nodes  in  the  network.  The  pulse 
spacing  between  adjacent  channels  is  (A®)'1  or  typically  less  than  lOps  to  achieve  100+  Gbit/s. 
Ultrafast  all-optical  demultiplexers  like  the  TOAD  are  used  to  extract  the  desired  channel  from 
the  high  capacity  OTDM  frame  at  the  node  receivers.  Nodes  can  select  the  received  time  slot 
by  using  a  time  slot  tuner  to  align  the  clock  with  an  incoming  time  slot  within  the  frame  for  all- 
optical  demultiplexing. 

To  perform  the  functionality  of  a  router,  addresses  are  mapped  to  specific  time  slots 
witliin  the  network.  Routing  is  achieved  by  sending  each  bit  of  the  packet  in  a  unique  time  slot 
corresponding  to  its  destination  node.  All  nodes  in  the  network  are  synchronized  by  splitting 
and  amplifying  the  optical  output  of  a  single  modelocked  fiber  laser.  Packet  routing  is 
performed  by  rapidly  changing  the  state  of  the  time  slot  tuner  to  transmit  into  time  slots 
corresponding  to  destination  addresses  on  the  network. 
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Fig.  1  OTDM  router  and  node  architecture 


Recently,  several  experimental  demonstrations  [1-3]  have  shown  that  OTDM  can  meet 
many  of  the  demanding  needs  of  a  router  and  a  multiprocessor  interconnect  system  which 
include  full  connectivity,  low  latency,  and  high  aggregate  throughput,  reliability,  and 
scalability.  We  report  the  demonstration  of  a  testbed  for  a  bit-interleaved  100-Gbit/s  OTDM 
broadcast  star  architecture  that  was  previously  proposed  [4].  Unique  to  our  network  is  a 
highly  scalable,  novel  node  design  that  provides  inter-channel  switching  within  the  single 
channel  bit  period  (1.6  ns).  By  combining  this  hardware  with  a  highly  efficient  arbitration 
protocol  [4],  near  lossless  channel  allocation  with  low  latency  is  achievable  for  high  speed 
switching  applications  such  as  future  all-optical  routers. 
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II.  Experimental  Demonstration  and  Results 

Fig.  2  shows  the  network  and  novel  node  architecture  experimental  setup.  The  two  key 
optical  components  of  the  node  are  the  recently  developed  fast  tunable  delay  line  (FTDL)  [5] 
and  the  terahertz  optical  asymmetric  demultiplexer  (TOAD)  [6].  A  controller  card  residing  in 
a  workstation  sends  electronic  NRZ  data  at  the  single  channel  bit  rate,  B ,  and  control  bits  to 
the  driver  board  specially  designed  to  control  the  two  FTDLs  on  the  clock  and  data  fibers. 
The  FTDLs  consist  of  cascaded  feed-forward  Mach-Zehnder  fiber  delay  lattices  designed  to 
produce  optical  copies  of  the  incoming  pulse  stream  organized  into  2k~ bit  subcells  spaced  by  T 
with  inter-subcell  bit  spacing  x  [5],  The  two  modulators  controlled  by  the  driver  board  select 
one  of  the  2kx2k  (=  N)  time  slots  into  which  one  of  the  copies  is  transmitted.  The  FTDLs  in 
the  node  are  used  to  transmit  data  into  a  selected  time  slot  within  the  OTDM  frame  and  align 
the  clock  with  a  given  time  slot  for  optical  demultiplexing.  Ultimately,  the  dimensionality  of 
the  network,  A,  is  determined  by  k ,  the  number  of  stages  in  the  FTDL.  The  intermediate 
processing  bandwidth,  B’  (=  1/T),  of  the  driver  controller  and  the  electro-optic  modulators  is 
designed  to  match  the  repetition  rate  of  the  picosecond  pulsed  fiber  laser  source  and  is  related 
to  the  single  channel  bit  rate  as  B’=  2kB .  Pulses  are  amplified  by  EDFAs  and  distributed  to 
the  individual  nodes  by  lxN  splitters.  After  node  data  modulation  and  time  slot  selection,  the 
data  is  multiplexed  by  precision  fiber  delays  feeding  an  NxN  star  coupler.  The  high 
bandwidth  OTDM  frame  is  broadcast  to  all  nodes  in  the  network.  Each  node  can  demultiplex 
any  single  channel  from  the  frame  using  an  FTDL  on  the  clock  and  a  TOAD. 
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Fig.  2  Experimental  OTDM  computer  interconnect  and  node  architecture 


In  our  experimental  testbed,  we  populated  16  (=  N)  time  slots  in  the  OTDM  frame  by 
constructing  2  (=  k)  stage  FTDLs.  The  single  channel  data  rate  was  chosen  to  match  the  OC- 
12  rate  (B  =  622.08  Mbit/s).  The  2-ps  pulsed  1550-nm  fiber  laser  repetition  rate  and 
intermediate  electronic  processing  bandwidth  were  set  to  the  OC-48  rate  (#'  =  1/T  -  2.48832 
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GHz).  The  simple  electronic  design  of  the  driver  board  permits  the  rapid  control  of  the  FTDL 
and  provides  low  latency,  arbitrary  channel  selection.  The  driver  board  was  constructed  using 
4-bit  electronic  multiplexers  (Vitesse)  and  simple  logic  operating  at  the  OC-48  rate.  To 
produce  an  OTDM  frame  with  an  aggregate  bit  rate  of  100  Gbit/s,  r  =  10  ps  was  chosen. 
Each  TOAD  was  designed  with  a  demultiplexing  window  width  of  about  10  ps  at  FWHM  and 
a  polarization  splitter  was  used  to  separate  data  from  clock  at  the  output. 

The  100-Gbit/s  multiplexing  and  demultiplexing  experimental  results  are  shown  in  Fig. 
3.  According  to  the  design  of  the  FTDL,  the  16  time  slots  in  our  OTDM  frame  are  arranged  in 
4  subcells  each  containing  4  time  slots  spaced  by  10  ps.  Our  network  demonstration  focused 
on  one  of  the  subcells  within  the  frame.  Fig.  3a  shows  the  aggregate  eye  diagram  for  a  subcell 
with  multiplexed  data  from  4  nodes  with  a  fixed  pattern,  1  -  pseudorandom  -1-0,  on  a 
bandwidth  limited  detector  (34-GHz  photodetector,  50-GHz  oscilloscope).  Upon 
demultiplexing  by  TOADs  tuned  to  the  individual  channels,  each  is  resolved  in  Fig.  3b  (the 
4th  time  slot  is  omitted  as  it  is  0). 
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Fig.  3  100  Gbit/s  multiplexed  data  OTDM  subcell  eye  diagram  on  bandwidth  limited 

detector,  and  demultiplexed  TOAD  output  eye  diagrams  for  three  channels  in  subcell. 

a)  100  Gbit/s  multiplexed  data  OTDM  subcell  eye  diagram 

b)  Demultiplexed  TOAD  output  eye  diagrams 


We  constructed  two  fully  functional  nodes  to  measure  the  bit  error  rate  (BER)  and 
demonstrate  the  rapid  inter-channel  switching  capability  of  the  network  nodes  using  an 
arbitration  protocol.  These  experiments  were  performed  using  adjacent  channels  in  the  same 
100-Gbit/s  subcell  (Channels  0  and  1).  Fig.  4a  shows  a  plot  of  the  BER  versus  the  single 
channel  average  data  input  power  at  the  TOAD  when  Chan  0  and  Chan  1  were  modulated 
with  pseudorandom  data.  For  average  data  and  clock  input  powers  greater  than  -21  dBm  (13 
fj  pulse  energy)  and  -8  dBm  (250  fJ  pulse  energy)  respectively,  several  hours  of  error  free 
operation  have  been  achieved.  Additionally,  we  have  observed  that  the  TOAD  can  provide 
gain  to  the  demultiplexed  signal.  The  inset  to  Fig.  4a  shows  the  eye  diagram  of  the  data  input 
(upper  trace)  and  demultiplexed  output  (lower  trace)  of  a  TOAD  demultiplexing  a  single 
channel  of  pseudorandom  data  with  identical  oscilloscope  settings.  The  demultiplexed  output 
is  larger  in  amplitude  than  the  input  by  approximately  6  dB. 
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The  fast  inter-channel  switching  capability  of  the  network  was  also  demonstrated  by 
using  a  previously  proposed,  low  latency  arbitration  protocol  [4]  and  two  nodes  of  the 
network.  The  receivers  of  both  nodes  are  fixed  to  listen  to  their  own  time  slots.  Each  node 
transmits  its  binary  address  at  the  single  OC-12  channel  rate  into  its  own  time  slot.  If 
successfully  received,  each  node  then  transmits  its  address  into  the  time  slot  of  the  other  node. 
Fig.  4b  shows  a  demonstration  of  the  protocol  using  two  nodes  in  the  network  whose  time 
slots  are  adjacent  in  the  100-Gbit/s  subcell.  The  addresses  assigned  to  Node  0  and  Node  1 
were  0101  and  0111  respectively.  The  traces  shown  are  the  demultiplexed  TOAD  outputs 
directly  from  the  analog  output  of  the  receivers  for  the  two  nodes.  After  each  node 
successfully  receives  its  own  address,  the  FTDLs  rapidly  reconfigure  within  a  single  bit  period 
to  transmit  into  the  time  slot  of  the  other  node.  Note  that  each  node  now  successfully  receives 
the  address  of  the  other  in  its  own  time  slot.  The  FTDLs  and  driver  board  electronics  are 
capable  of  tuning  to  any  one  of  the  16  time  slots  in  the  network  within  1.6  ns,  greatly  reducing 
the  hardware  latency  of  the  protocol. 


Fig.  4  BER  of  channels  0  and  1  against  average  single  channel  input  power,  and 

demonstration  of  rapid  channel  selection  on  bandwidth  limited  analogue  detector 

a)  BER  of  channels  0  and  1  against  average  single  channel  input  power 

Inset:  TOAD  input  and  output  eye  diagrams  demonstrating  gain 

o  -channel  0 
+  -channel  1 

b)  Demonstration  of  rapid  channel  selection 
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III.  Conclusion 

We  have  demonstrated  a  fully  connected  100-Gbit/s  OTDM  network  architecture  that  offers 
fast  switching  among  data  channels  with  reliable,  error  free  operation  and  low  latency.  Since 
the  active  components  of  the  FTDLs  do  not  scale  with  the  number  of  nodes  [5],  simply  adding 
another  stage,  £  =  3,  ( 3  dB  additional  loss  per  node),  scales  the  interconnect  up  to  64  (=  N) 
nodes  without  taxing  the  power  budget  significantly.  If  OC-24  ( B  =  L24416  GHz)  is  chosen 
as  the  single  channel  data  rate  and  10-GHz  (=  B’  )  intermediate  processing  bandwidth 
electronics  are  used,  an  80-Gbit/s  interconnect  with  a  rapid  inter-channel  switching  speed  of 
800  ps  is  feasible.  In  such  a  64-processor  architecture,  coherent  crosstalk  does  not  limit  the 
BER  performance  significantly  [7].  Since  the  demultiplexer  [8]  and  other  optical  components 
in  the  node  can  be  integrated,  we  believe  this  network  is  practical  for  future,  high-speed 
multiprocessor  interconnect  systems. 
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Abstract.  In  this  communication  a  novel  CMOS  amplifier  providing  a 
differential  gain  higher  than  20  dB  and  a  cut-off  frequency  of  200  MHz  is 
presented.  The  circuit  includes  a  single-to-differential  input  converter  that, 
unlike  traditional  approaches,  avoids  reducing  the  very  high  input  resistance 
of  the  main  differential  amplifier.  Moreover,  thanks  to  an  auxiliary  section,  an 
extra  6-dB  gain  is  achieved.  The  whole  amplifier  has  been  designed  with  a 
0. 8 -Jim  p- well  technology  and  uses  a  supply  voltage  of  3  V. 

I.  Introduction 

Fully  differential  approach  is  usually  required  in  high-frequency  applications  [1-5],  due  to  its 
attractive  and  well-known  properties  of  immunity  to  common-mode  disturbances,  rejection  to  parasitic 
couplings  and  increased  dynamic  range  [6-7].  However,  there  are  cases  in  which  a  single-ended  source 
comes  from  an  external  filter  although  the  differential  approach  must  be  preserved  into  the  chip. 
Examples  are  the  circuits,  which  are  placed  in  cascade  to  RF  image  filters  and  IF  filters.  In  addiction, 
there  arc  circuits,  such  as  four-quadrant  multipliers,  that  require  pure  differential  signals  to  perform 
their  function.  In  all  these  cases,  a  stage  able  to  convert  a  single-ended  signal  into  a  differential  one  is 
needed. 

A  differential  pair  with  one  grounded  input  terminal  can  perform  this  basic  function,  if  the 
symmetry  of  the  output  is  not  the  main  goal.  At  this  purpose,  to  provide  a  proper  bias  condition  to  the 
differential  pair,  the  most  common  solution  is  that  of  using  the  R-C  network  around  the  main  amplifier 
as  shown  in  Fig.  1 .  For  sufficiently  high  frequencies  (where  capacitor  C  can  be  assumed  short-circuited) 
the  gain  of  the  amplifier  is  therefore 


^  =  2^,2  (1) 

v  in 

where  gmj  is  the  transconductance  of  the  i-th  transistor.  The  main  drawback  of  this  approach  is 
represented  by  the  heavy  reduction  of  the  input  resistance  of  the  amplifier.  This  can  represent  a  serious 
problem  in  many  applications.  Additionally,  the  use  of  high  values  of  resistance  R  is  area  consuming 
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and  worsens  the  noise  performance  of  the  amplifier.  Moreover,  as  the  frequency  increases,  the  CMRR 
decreases  causing  in  turn  a  reduction  in  the  symmetry  of  the  output. 


VDD 


Fig.  1.  Single-input  differential-output  amplifier  with  traditional  input  biasing 
II.  Proposed  Solution 

To  overcome  the  previously  mentioned  limitations,  the  arrangement  in  Fig.  2  was  developed. 

VDD 


Fig.  2.  Simplified  schematic  of  the  proposed  amplifier 

Observe  that  IF  CMOS  stages  can  profitably  take  advantage  of  active  loads  based  on  diode- 
connected  transistors.  Indeed,  unlike  bipolar  transistors,  amplifier  gains  which  are  set  by  the  ratio  of 
MOS  transconductances  of  transistors  with  different  dimensions  can  provide  values  higher  than  20  dB 
at  hundreds  megahertz  operating  frequencies  [8].  The  DC  condition  of  circuit  in  Fig.  2  is  now 
accurately  set  by  the  auxiliary  amplifier  A2  which  senses  any  deviation  of  the  output  voltage  from  zero, 
properly  driving  A1  and,  in  turn,  the  gate  of  M2.  This  feature  reduces  the  output  offset  and  compensate 
for  any  parameter  mismatching  in  the  differential  stage.  On  the  other  hand,  for  frequencies  where 
capacitor  C  can  be  assumed  as  short-circuited,  the  auxiliary  amplifier  A2  gives  a  virtual  ground  on  the 
source  of  the  coupled-pair  Ml -M2.  More  specifically,  assuming  a  finite  differential  gain  for  A1  equal  to 
A,  and  denoting  with  rB  the  output  resistance  of  current  generator  IB,  the  voltage  at  the  gate  of  M2  is 


Ill 


v2  = 


_ 8m\Ai _ 

j  vin 

8m\  + - £m2(1  +  ^l) 

rB 


8  ml  ^1  .  r 

8 m2  0  +  ^1  P" 


(2) 


Thus,  i fA2  »1  and  gml  ~gm2  =gm 1>2,,  a  signal  about  equal  to  14,  but  with  a  phase  shift  of  180°  is 

provided  to  the  gate  of  M2.  This  feature  allows  an  extra  6-dB  gain  to  be  achieved.  In  fact  the  gain  is 
now  given  by 


A  = 

8  m3, 4 


(3) 


The  detailed  schematic  of  the  amplifier  is  shown  in  Fig.  3,  where  the  auxiliary  amplifiers  were 
implemented  with  simple  differential  stages  with  mirror  active  load,  and  where  a  common  drain 
transistor,  M5,  with  the  associated  bias  current  generator  IB2  was  introduced  for  level-shifting 
purposes. 

A  final  observation  concerns  the  stability  of  the  loop  made  up  of  Al,  the  main  amplifier  and  M5. 
Let  raj  and  CQ\  be  the  equivalent  output  resistance  and  capacitance  at  the  output  of  Al,  respectively, 
and  Cs  the  total  capacitance  at  the  source  of  Ml -M2.  By  breaking  the  loop  at  the  gate  of  M5  and 
restoring  the  load  conditions  we  find  the  expressions  of  the  loop  gain,  7^  =  gmejroi/2 ,  the  dominant 

pole, ~yro\^o\  »  ar|d  the  second  pole  (On =  2g m\,lfCs  .  Hence,  to  achieve  a  given  phase  margin,  0, 
we  have  to  set  (0GBW  =  (D2/tan  <j) ,  that  is 


Cai  =  tan0— — ^ Cs 
8  ml. 2 


(4) 


Fig.  3.  Detailed  schematic  of  the  proposed  amplifier 

III.  Simulations 

The  circuit  in  Fig.  3  was  simulated  with  SPICE  using  the  parameters  of  a  0.8-pm  p-well 
technology.  The  supply  voltage  was  set  to  3  V.  Transistor  dimensions  and  bias  currents  arc  reported  in 
Tab.  I.  The  loop  stability  was  ensured  with  these  settings.  Only  an  additional  1-pF  capacitor  was 
connected  between  the  gate  and  source  of  transistor  M5,  to  provide  a  feed-forward  compensation. 

Fig.  4  illustrates  the  frequency  responses  of  the  complete  solution  in  Fig.  3  and  that  of  the 
amplifier  in  Fig.  1.  For  the  latter  the  same  settings  as  in  Tab.  I  were  chosen.  Moreover,  R  =  10  kH  and 
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C  =  10  pF  was  set.  It  can  be  observed  that  the  proposed  amplifier  achieves  a  gain  of  21.5  dB  which  is 
6-dB  higher  than  that  in  Fig.  1.  The  high  cut-off  frequency  is  higher  than  200  MHz.  The  flat  band  with 
a  gain  variation  of  0.5  dB  ranges  between  10  to  80  MHz  (being  the  lower  limit  due  to  the  chosen 
capacitor  value  of  10  pF).  For  the  same  two  circuits,  Fig.  5  illustrates  the  CMRR ’  defined  as 

CMRR  =\V0fdm/V0lcm\i  where  VoM  and  V0,fm  is  the  differential-  and  common-mode  output  voltage, 

respectively.  For  almost  all  the  frequency  range  of  interest  the  circuit  in  Fig.  3  exhibits  a  better  CMRR 
with  a  maximum  difference  of  about  95  dB  at  70  MHz. 


Tab.  I.  Circuit  Parameters 


Parameter 

Value 

Ml  M2 

30/0.8 

M3M4 

3/0.8 

M5M6  M7 

20/0.8 

M8  M9M12M13 

6/0.8 

M10M11 

2/0.8 

IB1 

80  pA 

IB2 IB4 

20  pA 

IB3 

160  pA 

C 

10  pF 

dB  dB 


circuits  in  Fig.  1  (curve  a)  and  Fig.  3  (curve  b)  Fig.  3  (curve  b) 
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Abstract.  A  CMOS  7th-order  equiripple-phase  filter  for  PRML  read  channel  applications 
is  presented.  The  key  element  of  this  design  is  a  Gm  cell  in  which  the  unbalanced  differential 
pairs  used  to  increase  the  linearity  of  the  transconductor  core  also  introduce  a  zero  which  cancels 
the  parasitic  pole  of  a  high-impedance  folded-cascode  output  stage.  The  filter  was  designed 
using  a  0.3 5fj,m  3.3V  process— all  simulation  results  presented  here  use  a  supply  voltage  of 
2.5  V.  The  total  power  consumption  is  1  lOmW  for  a  cut-off  frequency  of  100MHz  without  boost. 

1.  Introduction 

Continuous-time  filters  are  the  only  practical  solution  for  signal  frequencies  above  100MHz.  A 
prime  application  in  this  area  as  far  as  design  challenges  and  market  opportunities  are  concerned 
is  the  read  channel  for  data  storage  systems.  The  combination  of  low-pass  filter/equaliser  limits 
the  passband  noise,  increasing  the  signal-to-noise  ratio,  and  adjusts  the  read-back  signal  to 
the  target  pulse  shape.  The  common  partial-response  targets  require  the  equaliser  to  provide 
amplification  (or  boost)  at  frequencies  around  0.4  times  the  data  rate,  with  programmable  values 
up  to  12dB.  The  cutoff  frequency  of  the  low-pass  filter  must  be  programmable  with  a  typical 
tuning  range  of  8:1  to  take  advantage  of  the  zone  bit  recording  technique  where  the  data  rate  is 
increased  on  a  zone-by-zone  basis  as  the  head  moves  from  the  innermost  to  the  outermost  track. 

2.  Proposed  TVansconductor  and  Filter 

The  first  and  third  differential  pairs  of  the  proposed  transconductor  of  Figure  1  are  deliberately 
unbalanced  so  that  {W/L) i  —  n(W/L)2  and  (W/L)6  =  n(W/L)5  while  transistors  M2,  M3, 
M4  and  M5  are  identical.  The  mismatched  differential  pairs  increase  the  linear  input  range  [3] 
as  illustrated  in  Figure  3(a).  In  addition,  the  mismatched  differential  pairs  introduce  a  zero 
which  can  be  used  to  cancel  the  parasitic  pole  created  by  the  folded-cascode  output  stage  [2]. 
Both  the  magnitude  and  phase  response  of  the  proposed  transconductor  is  shown  in  Figure  3(b). 

The  complete  results  of  a  detailed  AC  analysis  are  not  entered  into  here,  but  the  general 
principle  behind  the  pole-zero  cancellation  is  outlined  below.  Unlike  the  classical  case,  the 
common-source  node  of  a  mismatched  differential  pair  is  not  AC  ground.  It  introduces  a  zero, 
Zi,  given  by 

Zi  „  (ffm!6  +  ffm25)ffm34  +  4 ffml6gm25  Qds_ 

(<7ml6  +  £m25  +  9mZi)Cds  Cds  ' 

This  work  is  supported  by  Silicon  Systems  Limited  (SSL)  and  PEI  Technologies,  University  of  Limerick. 


Figure  1 :  Transconductor  consists  of  three  differential  pairs  and  a  folded-cascode  output  stage. 


Figure  2:  Filter  structure  consists  of  one  single-pole  stage  followed  by  three  biquadratic  stages. 
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where  gml6 ,  gm25  and  gmM  represent  the  transconductance  of  transistors  M1}6,  M2)5  and  M34 
respectively,  and  g^s  is  the  drain-source  conductance  and  CdS  is  the  drain-source  capacitance  of 
transistors  M\  1,12,13-  The  location  of  the  zero  can  be  moved  to  cancel  the  parasitic  pole  created 
by  the  folded-cascode  output  stage  through  appropriate  scaling  of  bias  transistors  Mn  and  M13. 

The  filter  boost  is  realised  using  two  real  zeros.  The  basic  requirement  is  that  the  zeros 
are  symmetrical  with  respect  to  the  imaginary  axis  so  as  not  to  interfere  with  the  group  delay. 
However,  by  introducing  a  controlled  asymmetry  one  can  adjust  the  overall  group  delay  to  com¬ 
pensate  for  phase  errors  introduced  by  other  blocks  within  the  channel.  The  filter  structure  [1] 
shown  in  Figure  2  realises  the  zeros  independently.  Just  one  folded-cascode  stage  is  required 
for  each  integrating  node.  The  natural  frequency  of  the  last  biquad  is  over  twice  the  filter  cutoff 
frequency  which  results  in  very  low  values  for  its  integrating  capacitances.  To  avoid  this  the 
transconductor  cores  of  this  biquad  have  been  doubled.  The  complete  filter  transfer  function  is: 


H(s)  = 
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o\ 


(!+*) 


u. 
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(1  -  -^-) 
V  Uz2  ' 


1  + 


S2  +  S^+U2l 
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S2  +  S^+UJ2o2 


o3 


S 2  +  5^  +  U)2 
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3.  Simulation  Results 


The  filter  was  designed  using  a  035pm  3.3V  process — all  simulation  results  presented  here  use 
a  supply  voltage  of  2.5V.  The  small-signal  response  is  shown  in  Figure  4(a)  with  and  without 
boost  across  the  entire  tuning  range.  The  filter  is  tuned  by  varying  the  bias  current  of  the 
Gm  cells  to  give  a  tuning  range  of  2:1.  Figure  4(b)  shows  the  transient  analysis  for  a  10MHz 
200m Vp.p  sine  wave  and  the  associated  gain,  phase  and  group  delay.  The  filter  THD  is  -44.5dB. 

4.  Conclusion 

In  this  paper  a  CMOS  7th-order  equirippl e-phase  low-pass  filter  for  partial -response  maximum- 
likelihood  read  channels  was  presented.  The  idea  of  increasing  the  linear  input  range  of  the 
transconductor  using  mismatched  differential  pairs  was  complemented  by  a  novel  pole-zero 
cancellation  strategy,  independent  of  tuning,  allowing  the  use  of  a  folded-cascode  output  stage. 
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Figure  3:  The  plot  on  the  left  illustrates  how  combining  three  differential  pairs  (two  of  which 
are  deliberately  unbalanced)  and  adding  their  respective  output  currents  increases  the  available 
linear  input  range.  The  small-signal  plot  on  the  right  shows  the  magnitude  and  phase  response 
of  an  integrator  implemented  with  the  proposed  transconductor  driving  a  lpF  load  capacitance. 

Flit  i  Hm.froeMior 


Figure  4:  The  plot  on  the  left  shows  the  magnitude  response  of  the  filter  with  and  without  boost 
across  the  entire  tuning  range.  The  plot  on  the  right  shows  the  filter  gain,  phase  and  group  delay. 
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Abstract.  This  paper  presents  a  switched  current  multiplier,  dedicated 
for  the  use  in  highly  parallel  computation  arrays  or  neural  networks.  It  is 
designed  for  3V  supply  voltage,  performing  50k  multiplications  per  second 
with  a  power  dissipation  of  lOOnW  and  an  accuracy  better  than  1.5%  consid¬ 
ering  the  presence  of  possible  device-mismatch. 


I  Introduction 

Analog  multipliers  are  fundamental  functional  blocks  in  many  circuits  and  systems.  A 
lot  of  different  approaches  to  build  analog  multipliers  has  been  investigated,  one  of  them  is  the 
application  of  the  translinear  principle  [1].  Originally  formulated  for  bipolar  devices,  the  trans- 
linear  principle  is  based  on  the  exponential  voltage-to-current-transfer-characteristics  of  its 
comprising  elements.  Not  only  bipolar  transistors  show  the  required  exponential  behavior  but 
MOS-transistors  operating  below  threshold,  too.  Consequently,  translinear  circuits  can  be 
designed  with  a  standard  CMOS  technology  [2]. 

Unfortunately,  translinear  circuits  comprising  subthreshold  MOS  transistors  are  very 
vulnerable  to  device  mismatch.  This  problem  is  often  reported  and  analyzed  [3]  but  simple 
solutions  for  this  problem  have  not  been  reported  yet.  Alsam-Sidqui  et  al.  [4]  have  proposed  to 
use  floating-gates  to  correct  for  errors  caused  by  device  mismatch.  We  suggest  a  dynamic  prin¬ 
ciple  to  reduce  the  influence  of  device  mismatch  on  the  accuracy  of  a  frequently  used  transiin- 
ear  multiplier  circuit.  Our  analog  multiplier  cell  is  dedicated  for  the  use  in  massive  parallel 
computation  arrays  or  in  analog  neural  networks. 

In  section  2  a  conventional  translinear  multiplier  and  the  problems  associated  with  it 
when  implementing  it  in  a  standard  CMOS-technology  are  reviewed.  Furthermore,  the  general 
principle  of  our  switched  current  multiplier  is  introduced.  Section  3  deals  with  the  actual 
implementation  of  the  multiplier  circuit  and  provides  some  simulation  results. 

II  TVanslinear  Multipliers 

A  typical  translinear  multiplier-circuit  is  depicted  in  Fig  la.  With  the  subthreshold  trans¬ 
fer  characteristics  I  -  /p0exp((K  GS-  Vf)/nUT)  of  a  transistor  in  saturation,  where  /p  0  is  a 
current  constant,  VGS  the  gate-source-voltage,  Krthe  threshold  voltage,  n  the  subthreshold 
slope  factor  and  UTi  he  thermal  voltage,  the  analysis  of  the  translinear  loop  yields: 

I\h  =  hU-  (i) 

As  one  can  see  a  multiplication  or  division  of  several  unipolar  currents  can  easily  be  performed 
using  this  circuit. 

Employing  differential  signals  this  circuit  is  suitable  for  a  two-quadrant  operation  as 
well.  The  first  input  signal  then  is  the  difference  of  the  currents  I{  -  /4  and  the  output  signal  is 
represented  by  the  current-difference  /2-/3.  Using  some  basic  algebra  equation  (1)  can  be 
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Fig.  1:  (a)  A  translinear  multiplier  cell  and  (b)  its  switched-current  counterpart 


transformed  as  follows: 


h-h 


/l~/4 

/,+/4 


(h  +  h)- 


(2) 


As  can  be  seen,  the  second  input  signal  is  the  sum-current  Iw  =  I2  + /3 . 

There  are  several  drawbacks  associated  with  this  multiplier.  As  stated  in  the  introduction 
this  circuit-topology  is  very  vulnerable  to  device-mismatch.  Mismatch  describes  the  effect  that 
two  identical  designed  devices  have  random  differences  in  their  behavior.  In  the  subthreshold- 
operation  device  mismatch  can  be  modeled  by  a  variation  of  the  current-constant  ID0  from  its 
nominal  value  [3].  As  the  mismatch  decreases  with  increasing  transistor  area,  the  influence  of 
mismatch  can  be  reduced  by  employing  large  devices.  However,  to  achieve  a  high  accuracy 
the  devices  would  have  to  be  prohibitively  large  in  area. 

In  order  to  analyze  the  influence  of  mismatch  equation  (1)  is  reformed  again  but  the  cur¬ 
rent  constant  IDQ  is  no  longer  assumed  to  be  equal  in  all  devices: 


ho,  i  ho,  3  ho,  2  I  do,  a 


Accordingly,  the  mismatch  of  the  current  constants  yields  a  constant-gain  error  for  the  translin¬ 
ear  loop.  For  the  differential-signal  case  the  device-mismatch  entails  a  non-linearity  error: 


,  . . (/,-/4)(e+  l)  +  (71+/4)(e-l) 

/2-/3  -  ^2  +  /3)-(/i+/4)(e+1)  +  (/i_/4)(8_1) 


with 


8  = 


ho,2’ho,A  ^ 
ho,  i  ‘  ho,  3 


Further  limitations  of  this  multiplier  topology  are  due  to  the  finite  output  resistance  of 
the  transistors,  the  body  effect  and  the  voltage-dependency  of  the  slope-factor  n .  These  limita¬ 
tions  are  common  for  all  translinear  subthreshold  MOS  circuits  and  discussed  elsewhere, 
see  [2],  [5]. 

The  idea  for  the  dynamic  approach  is  adapted  from  the  current-copier-cell  [6],  where  two 
devices  are  dynamically  replaced  by  one  device.  The  functional  principle  of  the  proposed  mul¬ 
tiplier  is  depicted  in  Fig.  lb.  In  the  first  clock  cycle  the  transistors  M12  and  M34  perform  the 
function  of  the  outer  transistors  Ml  and  M4.  When  equilibrium  is  reached  the  input  currents  I\ 
and  I4  equal  the  drain-currents  of  M12  and  M34  respectively  and  the  voltages  across  the  capac¬ 
itors  Cl  and  C2  have  an  adequate  value  with  respect  to  the  common  source  voltage  Vrep  In  the 
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Fig.  2:  Circuit  diagram  and  layout  of  the 


vl  e  a  ya 


proposed  multiplier 


second  clock  cycle  all  switches  are  toggled  and  the  gate  nodes  are  disconnected  from  the  rest 
of  the  circuit.  Thus,  the  voltages  across  the  capacitors  remain  constant  and  store  the  input  sig¬ 
nal.  The  transistors  M12  and  M34  are  connected  to  accomplish  the  function  of  the  inner  tran¬ 
sistors  M2  and  M3.  The  second  input  signal  is  applied  as  current  Iw  into  the  common  source 
node  whereas  the  difference  of  the  drain-currents  of  M12  and  M34  represents  the  adequate 
output  value. 

This  circuit  operating  in  discrete  time  performs  the  same  functionality  as  its  time-contin¬ 
uous  counterpart  without  being  sensitive  to  device-mismatch:  As  the  function  of  the  transistors 
M1/M2  and  M3/M4  are  accomplished  by  the  same  devices  Ml 2  and  M34  respectively  there  is 
no  mismatch  between  the  transistors  M1-M2  and  M3-M4.  With  ID0  \  =  JDQ  2  and 
^ do,  3  “  ^ do,  4  error  term  e  of  equation  (3)  becomes  unity.  Thus,  this  dynamic  multiplier 

is  inherently  insensitive  to  device  mismatch. 


Ill  Actual  Circuit  Implementation 

The  functional  principle  of  the  dynamic  multiplier  is  shown  in  Fig.  lb.  The  main  inaccu¬ 
racies  of  an  implementation  are  caused  by  the  parasitic  capacitances  of  MOS-transistors.  From 
the  first  to  the  second  clock  cycle  the  drain  and  source  voltages  of  M  l  2  and  M34  are  changing. 
According  to  the  capacitive  divider  formed  by  the  transistors  overlap  capacitances  and  the 
storing  capacitors  Cl  and  C2  the  voltages  across  these  capacitors  change.  As  these  voltages  are 
storing  the  information  the  signal  becomes  corrupted.  In  our  implementation  the  storing  nodes 
are  decoupled  by  means  of  source-followers. 

A  second  problem  is  the  clock-feedthrough  of  the  switches  connected  with  the  storing 
capacitors.  When  these  transistors  switch  off  they  release  their  channel  charge.  Thus,  a  part  of 
the  channel  charge  flows  onto  the  storage-capacitors  and  corrupts  the  stored  signal.  To  over¬ 
come  this  problem  a  compensation  technique  known  from  dynamic  current  mirrors  is 
employed  [7]. 

The  complete  circuit  diagram  is  shown  in  Fig.  2.  The  translinear  elements  have  a  rather 
high  W/L-ratio  of  W/L=10  because  the  subthreshold  operation  of  the  transistors  for  tail  cur¬ 
rents  up  to  20  nA  has  to  be  assured.  The  storage  capacitors  are  implemented  as  gate-substrate 
capacitors.  The  source-follower  and  the  transistors  for  the  charge-injection  compensation  are 
marked  in  the  circuit  diagram.  All  switches  are  minimum-sized  devices.  This  circuit  was 
implemented  using  a  0.6  Jim  digital  standard  technology.  As  the  layout  in  Fig.  2  illustrates,  the 


120 


Fig.  3:  Simulated  multiplication  result  and  multiplication  error. 

circuit  is  approximately  as  small  as  a  basic  digital  standard  cell  (30  •  31jnm2). 

A  simulation  result  with  no  device-mismatch  present  is  provided  in  Fig.  3.  A  small  resid¬ 
ual  error  of  some  0.2%  due  to  bulk-effect  and  finite  output  resistance  of  the  transistors 
remains.  By  means  of  Monte-Carlo-simulations  the  maximum  error  was  determined  to  be 
smaller  than  1.5%.  The  main  error  is  caused  by  the  device  mismatch  of  the  switch-compensa¬ 
tion  transistor  pair. 

IV  Conclusion 

In  this  paper  an  accurate  translinear  two-quadrant-multiplier  is  presented.  It  can  process 
up  to  50.000  multiplications  per  second  and  dissipates  less  than  lOOnW  of  power.  If  the  signal 
is  stored  up  to  20ms  the  residual  error  is  always  less  than  1.5%  which  equals  an  accuracy  of 
5  bit.  For  the  derivation  of  this  maximum  error  the  possible  mismatch  of  all  devices  was  taken 
into  account. 
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Abstract  A  sixth-order  SC  bandpass  filter  based  on  conventional  SC 
integrators  is  considered.  The  capacitor  spread  is  reduced  from  87.1  to 
11.644  by  replacing  two  of  the  conventional  integrators  by  very-large-time 
constant  (VLT)  integrators.  Subsequently  the  integrators  of  the  last  filter 
configuration  are  replaced  by  gain-and-offset-compensated  SC  integrators 
for  reducing  the  influence  of  op  amp  imperfections.  The  effect  of  these 
consecutive  replacements  on  the  amplitude  response  and  on  the  output  offset 
voltage  of  the  filter  is  investigated. 

I.  Introduction 

In  order  to  compare  two  different  filter  designs  the  capacitor  spread  is  most  often  used 
as  performance  criterion.  Large  capacitor  ratios  (  larger  than  30)  are  difficult  to  achieve  with 
any  accuracy  in  integrated  form  [1].  To  reduce  the  capacitor  spread  in  SC  circuits  different 
techniques  have  been  proposed  [2-4],  Virtually  all  area-efficient  implementations  for  VLT 
integrators  generally  suffer  from  higher  sensitivity  to  finite  amplifier  gain  and  the  offset 
voltages  of  the  op  amps. 

IL  Sixth-order  bandpass  SC  filter  with  conventional  integrators 

Fig.  1  shows  a  sixth-order  bandpass  SC  filter  [1],  which  is  scaled  for  maximum 

C33  c53 


dynamic  range  and  output  swing.  The  normalized  capacitor  values  are  : 

Ci=87.10;  C2=71.01  ;  C3=87.10;  C4=87.10;  C3=87.10;  C6=68.26  ;  Cn=84.20;  C21=l  1.90; 
C3,==9.53;  C4I=13.04;  C5i=17.03  ;  C61=18.00;  C12=72.17;  C22=16.70  ;  C32=11.61  ; 
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C42=7.82  ;  C52=8.73  ;  C62=9.75  ;  C|3=1.00;  C23=22.26  ;  C33=7.48  ;  C43-3.67  ;  Cs3-25.39. 
The  capacitor  spread  is  87. 10.  The  waveform  of  the  two  nonoverlapping  clock  signals  Oj 

and  ®2  is  sketched  in  Fig.2.  For  the  sampling 
frequency  fc=100kHz  the  ideal  filter  passband 
characteristics  are  :  lower  passband  edge  300  Hz, 
upper  passband  edge  3400  Hz,  maximum  passband 
ripple  |Aa|<0.4dB. 

Let  us  suppose  that  the  capacitors  and  the 
nTc  - o.5T c  nTc  +  o.5Tc  t  switches  are  ideal.  The  op  amps  are  assumed  to  be 

ideal  except  for  a  finite  voltage  gain  described  by 
the  relation  A(jco)=-A0  and  a  nonzero  input- 

nTc  -  0.25TC  nTc  +  0.75TC  t 

referred  dc  offset  voltage  Vos . 

The  amplitude  responses  of  the  filter  in  the 
nTc-o.75Tc  nTc+o.25Tc  t  passband  for  two  cases  are  shown  in  Fig.  3  : 

(  i  )  Hki(f)  -  ideal  op  amps  with  Ao-»oo  ; 

Fig  2  Waveform  of  the  clock  ( ii )  Hc(f)  -  nonideal  op  amps  with  Ao=1000  . 
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Fig  3  Amplitude  responses  of  the  filter 
The  output  offset  voltage  in  steady  state  for  Vj=  0  and  Ao-»o°  is  : 

Cid=^yosl  -jr-O-  +  yr-)vos3+ 0 + ^r-Wosi  0) 

^32^52  ^52  ^32  ^"52 

This  gives  V0]oid  =1.601  V0Sl -3.552  V0S3  + 2.951  VOS5  .  By  computer  simulation  for  Ao=l 000 
one  obtains  VjOAo  =1.59 6VOSl  +0.004VOS2  -3.540VOS2-0.009VOS4  +2.940FO55  +0.0\VOS6. 

HI.  Sixth-order  bandpass  SC  filter  with  reduced  capacitor  spread 

The  capacitor  spread  can  be  reduced  by  replacing  the  first  and  the  fourth  conventional 
integrators  in  Fig.l  by  two  VLT  integrators,  proposed  in  [2,3].  The  circuit  diagram  of  the 
resulting  filter  is  shown  in  Fig. 4.  The  required  clock  signals  are  sketched  in  Fig.2. 
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(2) 

+  C1  8710  7  C4rc4+c43; 

are ; C i3 =9 .846122;  C43  =19.807875.  Hence,  the  capacitor  spread  is  reduced  to  C3/C33  =11.644. 
The  maximum  passband  ripple  of  the  magnitude  response  Ha(f)  (Fig.3)  for  Ao=1000  is  Aa= 
-0.652  dB  at  frequency  f  =  3.4  kHz  . 

The  output  offset  voltage  V0l0  in  steady  state  is  : 


(i) 
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+  (1  +  7TLFoS5  =  29.93KOS] 


3.552V. 


OS  3 


0.186FO54+2.951Fo,5; 


(ii)  =  29.467^,  +0.0708^a  -3.493^,-0.613^  +  2.903F*,  +0.046FOS6  . 

Compared  to  the  original  structure  (Fig.l)  the  offset  term,  due  to  the  first  VLT 
integrator  is  (1  +  2  C,  /Cl3)  times  larger.  For  reducing  the  effects  of  op  amp  imperfections  (  Ao 
and  Vos )  the  first  and  the  fourth  VLT  integrators  in  Fig.  4  are  replaced  by  gain-and-offset- 
compensated  VLT  integrators,  proposed  by  Nagaray  and  Lin  [2,3].  All  the  others  conventional 
integrators  are  replaced  by  FGI-  integrators  [5].  The  circuit  diagram  of  the  resulting  filter  is 
shown  in  Fig.  5  (  without  the  elements  enclosed  in  broken  line). 

The  values  of  the  holding  capacitances  Chi  (i  =1-3-6)  are  Chi=  C33-  Cn,in=  7.48  . 

The  maximum  passband  ripple  of  the  magnitude  response  Hb(f)  (Fig.3)  for  Ao=1000  is  Aa= 

=  -0.361  dB  at  frequency  f  =  300  Hz  .  The  output  offset  voltage  V0'0°  in  steady  state  is  : 


(i) 


y\o  _  QlQl 
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(4) 


=  -  2.1032KOSJ  +0.502^-0.093^ ; 

(ii)  C+  =  0.03VOSI -2.101FOS2  -0.0058VOS3  +  0.500VOS4  +0.0047VOS}  -0.0925FOS6  . 

The  influence  of  the  offset  voltage  V0S2  can  be  reduced  by  the  insertion  of  a  gain-and- 
offset-compensated  sample-and-hold-buffer,  enclosed  in  broken  line  (Fig.  5). 
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Fig  5  Gain-and-offset  compensated  bandpass  SC  filter  with  reduced  capacitor  spread 

The  magnitude  response  is  computed  for  nonideal  op  amps  with  Ao=T000.  The  maximum 
passband  ripple  is  Aa=  -0.369  dB  at  frequency  f=300  Hz  .The  corresponding  curve 
practically  converges  on  the  scale  chosen  to  the  response  Hb(f). 

The  output  offset  voltage  in  steady  state  is  : 

(i)  C,=o  ; 

(ii)  Vo]oAo^0.03VOS]-0.005SVOS3-0.000\5VOS4+0.0047VOS5  - 0.0000 6VOS6  . 

IV  Conclusion 

A  6th-order  SC  bandpass  filter  with  reduced  capacitance  spread  has  been  presented. 
Gain-and-ofFset-compensation  technique  was  applied  to  achieve  improvement  in  performance. 
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Abstract :  The  application  of  spectral  efficient  modulation  schemes,  particularly  in 
cellular  radio  systems  requires  linear  but  highly  efficient  power  amplifiers.  Linear 
amplification  with  non-linear  components  (LINC)  is  a  promising  linearisation 
technique  for  improving  power  amplifier  efficiency.  In  this  paper  we  present  a  novel 
technique  for  correction  of  gain/phase  errors  inherent  in  LINC  transmitters.  The 
technique  employs  a  feedback  signal  to  generate  an  error  signal  with  the  aid  of  a 
reference-modulated  signal.  A  direct  search  algorithm  is  used  to  estimate  the 
gain/phase  imbalance  between  the  two  amplifying  branches  for  correction.  Computer 
simulation  model  is  used  to  verify  the  validity  of  the  technique.  Simulation  results 
indicate  out-of-band  radiation  suppression  by  lOdB  for  a  gain  imbalance  of  ldB. 

I.  Introduction 

The  linear  amplification  with  non-linear  components  (LINC)  design  method  is 
one  of  the  promising  techniques  to  achieve  both  power  and  spectral  efficiency  [1-4]. 
However,  a  major  drawback  in  this  technique  is  its  inherent  sensitivity  to  gain  and 
phase  imbalances  between  the  two-amplifier  branch  [2].  The  LINC  transmitter 
linearity  relies  on  the  fact  that  the  AM-AM  and  AM-PM  characteristics  and  the 
operating  point  of  the  two-amplifiers  are  identical.  This  is  not  always  the  case  in 
practise  due  to  thermal  drift,  difference  in  electrical  length  and  ageing.  Most 
particularly  quantisation  error  in  the  signal  component  separator  (SCS)  may  cause 
changes  in  the  amplitude  of  the  input  signals  to  the  two  amplifier  branches,  resulting 
in  imbalance  in  amplifier  gain  [3],  Depending  on  the  amplifier  characteristics,  any 
c  ange  in  gain  may  introduce  some  degree  of  phase  error  between  the  two  branches. 

In  [4]  a  method  for  the  correction  of  the  phase  error  was  proposed  and  in  [2]  a 
technique  was  proposed  for  correcting  both  gain  and  phase  along  the  amplifier 
branches.  The  good  idea  in  the  latter  technique  is  that  it  relies  on  the  information  of 
the  out-of-band  emission  for  the  correction  of  the  gain  and  phase  imbalances. 
However,  the  drawback  in  this  technique  is  the  accuracy  in  optimisation  of  the 
complex  gain  and  phase  imbalances.  Particularly,  if  the  phase  error  in  this  system  is  a 
result  of  a  combination  of  both  gain  imbalance  and  difference  in  electrical  length 
Caution  must  be  taken  in  order  not  to  compensate  of  out-of-band  emission  at  the 
expense  of  less  distorted  in-band  signal.  In  this  paper,  however,  we  propose  a  new 
technique  for  cancelling  mainly  the  gain  imbalance  and  automatically  correcting  the 
resulting  phase  error  between  the  two-amplifier  branches.In  this  technique  we 
assumed  there  is  no  phase  error  due  to  difference  in  electrical  length  between  the  two 
amplifier  branches. 

T?e  paper.is  0rsanlseci  as  follows.  In  section  II,  we  introduce  the  principles  of  the 
LINC  transmitter  design.  In  section  III,  we  present  a  mathematical  description  of  the 
proposed  gain/phase  error  correction  scheme.  Then  in  section  IV,  the  simulation 
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model  of  the  LINC  transmitter  used  to  verify  the  validity  of  the  proposed  method  is 
described.  Finally,  in  section  V,  we  present  simulation  results  and  discussion  followed 
by  conclusion  in  section  VI. 

II.  Principles  of  linc  transmitter 

The  principle  of  LINC  transmitter  design  is  based  on  the  fact  that  the  complex 
envelope  of  the  RF  bandpass  signal  represented  as 

s(t)=r(tyjm ;  0<r(t)<rmm  (1) 

can  be  split  into  two  constant  envelope  signals 
(0  “  s(0  +  e(0  and 

s2(0  =  5(/)-e(0  (2) 

such  that  l-S',  (/)|  =  |j2  (0|  =  and 


•S|(0+52W  =  2j(0 

where  e(t)  is  a  signal,  quadrature  to  the  source  signal  s(t)  [3]and  is  given  as 


(3) 


e(t)  =  js{t) 


(4) 


The  two  constant  envelop  signals  in  (2)  are  amplified  separately  in  two  amplifier 
branches  and  recombined  to  give  the  output  signal  as  in  (3). 

If  there  is  no  gain  or  phase  imbalance  between  the  two  amplifier  branches  then 
during  recombination  the  source  signals,  s(t)  in  (2)  will  add  up  in-phase  whereas  the 
quadrature  signals  e(t)  cancel  out  to  give  the  output  signal 

SW(/)  =  2GS(0  (5) 

Where  G  is  the  small  signal  gain  of  each  amplifier  branch.  If  a  gain  imbalance  of  g 
and  a  corresponding  phase  error  0(g)  exist  between  the  amplifier  branches,  then  the 
output  signal,  from  (3)  and  (5)  is  given  as 

S.(f)  =  Gf5(/)(l+(l+^M)+e(0a-(l+g^(':')]  (6) 


In  the  frequency  domain  the  signal  components  in  (2)  consist  of  a  narrowband 
source  signal,  S(f)  and  a  wideband  signal,  E(f).  In  event  of  gain/phase  imbalance,  the 
latter  spreads  out  into  the  adjacent  band  [2].  It  is  clear  from  (6)  that  the  gain/phase 
imbalance  introduces  both  in-band  distortion  and  out  of  band  radiation.  Therefore,  the 
transmitter  output  could  be  split  with  a  3dB-coupler  attenuated  by  one  of  the 
amplifying  branch  linear  gain  (i.e.  G)  and  lowpass  filtered  to  recover  the  distorted 
source  signal  as 


z(/)»  £(0(1 +sA-tMg)) 


(7) 
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The  difference  signal  is  obtained  by  subtracting  (1)  from  (7).  Here,  we  employed 
the  criterion  of  uniform  approximation  Thus,  within  the  operation  region  of  (1)  the 
magnitude  of  difference  signal  is  given  as 

|£(/)-S(0|£A  (8) 

where  A  is  a  constant  value,  which  in  this  particular  case  depends  on  g  and  rmax  and  is 
given  as 


Clearly,  from  (9)  a  simple  search  technique  can  be  used  to  optimise  the  gain 
imbalance,  g  under  the  constrain  that 


(10) 


This  technique  is  based  on  a  similar  method  used  in  [5]  for  Cartesian  loop 
lineariser.  The  main  difference  between  this  technique  and  that  in  [5]  is  that  we 
combined  what  looks  like  the  Cartesian  loop  lineariser  with  the  LINC  transmitter 
design  with  the  aim  of  controlling  the  gain/phase  difference  between  the  two  amplifier 
branches.  Furthermore,  in  this  method,  the  error  signal  generation  and  correction  will 
be  carried  out  at  the  RF  stage  without  down  conversion. 


iv.  Simulation  model 


N/ 


The  baseband  simulation  model  used  is  as  shown  Fig.  1 .  The  source  signal  is  QPSK 
modulated  and  filtered  with  a  square-root  raised  cosine  filter  with  a  roll-off  factor 
0.35.  The  sampling  rate  was  16  times  the  symbol  rate  (24.3kHz).  The  source  signal  is 
split  into  two  constant  envelope  signals  by  the  signal  component  separator  (SCS),  the 
principle  of  operation  of  which  is  in  accordance  with  (4)  and  (2)  for  amplification  in 

the  two  amplifier  branches  as 
shown  on  Fig.l.  The  signal  in  the 
upper  branch  is  fed  directly  to  the 
amplifier,  PA1,  whereas  the  lower 
branch  amplifying  signal  is  amplified 
by  g  =ldB  before  being  fed  to 
the  amplifier,  PA2.A  nonlinear 
amplifier  model  operating  near 
saturation  was  used  as  branch 
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Fig.l.  The  baseband  simulation  model  of  the  amplifiers  ill  the  simulation.  The 

LINC  transmitter  with  the  linearisation  technique.  amplifiers  output  are  combined  and 

coupled  into  a  variable  attenuator 
(ATT),  which  attenuates  the  signal  by  approximately  the  amplifier  linear  gain  (G)  and 
then  fed  to  a  fifth  order  HR  Butterworth  lowpass  filter  (LPF)  to  suppress  the  wideband 
signal.  The  filter  bandpass  edge  attenuation  is  3dB  with  sampling  frequency  set  at 
twice  the  symbol  rate.  A  reference-modulated  source  signal  is  then  subtracted  from 
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the  filtered  signal  to  generate  an  error  signal  as  shown  in  Fig.l.  A  direct  search 
algorithm  is  used  to  estimate  the  gain/phase  imbalance  for  correction  in  the  lower 
amplifier  branch.  The  search  algorithm  used  here  consist  of  programming  codes  based 
on  (10),  directly  estimates  the  gain  error  g  by  determining  the  maximum  magnitude 
reached  by  dropping  previous  values.  The  algorithm  rapidly  converges  to  the  value  g. 

v.  Simulation  results  and  discussion 

The  simulation  results  assume  negligible  distortion  due  to  other  RF  components 
(i.e.  mixers,  filters).  The  results  are  depicted  in  Fig.2a  and  Fig.2b.  Figure  2a  shows  the 
power  spectral  density  (PSD)  of  the  transmitter  output  when  a  gain  imbalance  of  ldB 
was  introduced  in  the  lower  amplifier  branch  without  the  correction  circuit.  Figure  2b 
shows  the  PSD  when  the  correction  technique  was  used.  As  we  can  see  a  suppression 

of  the  spectral  re-growth  by  approximately  lOdB 
was  achieved  using  the  technique.  Obviously,  this 
method  reduces  the  out-of-band  radiation  and  in- 
band  distortion  as  well  as  the  noise  level  at  the 
transmitter  output.  The  technique  will  consequently 
improve  the  performance  in  terms  of  adjacent 
channel  radiation  and  the  signal  to  noise  ratio  in 
communication  systems  employing  LINC 
transmitters. 


vi.  Conclusion 

A  technique  for  improving  the  gain/phase 

, ,  ,  imbalance  correction  in  LINC  transmitters 

Fig.2.  a  and  bare  power  spectral  has  been  presented.  A  10dB  suppression  of 

density  ofthe  LINC  transmitter  out-of-band  emission  was  achieved  with  a 

output  without  and  with  the  gain  imbalance  up  toldB  between  the 

amplifier  branches.  The  accuracy  of  this  method  suggests  that  separate  circuitry  for 
correction  of  gain  imbalance  and  phase  error,  due  to  difference  in  electrical  length  can 
be  used  where  necessary,  depending  on  the  application,  design  complexity  and  cost. 
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Abstract:  In  this  work  we  study  optical  wireless  asymmetric  communication.  Specifically  the 
work  is  focused  on  IrDA  wireless  links  and  third  party  interference  to  an  already  established 
optical  link.  We  derive  bit  error  rates  at  the  physical  layer,  and  furthermore,  we  calculate  the 
average  throughput  of  IrDA  link  access  protocol  (IrLAP)  under  the  influence  of  third  party 
interference.  Results  of  the  BER  and  IrLAP  throughput  degradation  are  presented. 

Introduction 

Wired  communication  links  are  mostly  designed  to  be  symmetric.  By  symmetry  here  we 
mean  that  bi-directional  communications  over  the  same  link  are  of  the  same  quality.  Quality 
here  is  measured  by  low  bit  error  rate  (BER)  and  in  this  work  we  also  determine  the  link 
access  protocol  throughput  as  the  ultimate  measure.  In  wired  communications  symmetric 
communication  is  possible  because  noise  in  a  wired  medium  is  much  more  controlled  than  in 
wireless  media.  In  wireless  communications  and  specifically  in  optical  wireless  links,  bi¬ 
directional  link  asymmetry  has  been  observed,  [1],  [3]. 

The  ambient  light  noise,  which  directly  influences  the  performance  of  IR  links,  is  not 
always  the  same  for  both  communicating  users.  For  example,  a  user  working  on  a  PC/laptop 
and  exchanging  bi-directional  information  with  another  user  may  have  a  table  lamp  switched 
on  in  close  proximity  to  the  receiving  circuits,  with  result  to  cause  excess  noise  in  his  own 
receiver  compared  to  the  other  communicating  user.  Since  it  is  possible  and  likely  that  users 
are  under  the  influence  of  differing  quantity  and  sometimes  type  of  ambient  noise,  it  should 
not  be  expected  that  the  links  be  symmetric. 

Furthermore,  component  specification  and  manufacturing  tolerances  are  present  even  in 
products  from  the  same  manufacturer.  Within  IrDA,  the  standard  specifies  components  within 
bounds  and  it  is  likely  that  IrDA  devices  from  various  manufacturers  would  differ.  Provided 
that  transceiver  parity  [1] ,  is  not  maintained,  then  asymmetries  would  be  observed. 

Another  scenario  which  would  cause  asymmetric  bi-directional  communication  when 
two  identical  users  are  linked  and  communicating  under  the  same  ambient  noise  conditions 
but  a  third  user  unaware  of  the  already  established  link,  is  attempting  to  transmit.  'Third  user' 
or  'interferer  user'  transmission  in  the  proximity  may  detriment  one  of  the  existing  link 
directions.  This  effect  is  accentuated  by  the  fact  that  due  to  manufacturing  tolerances  on  mass 
produced  devices  by  the  same  or  different  manufacturer,  no  two  transceivers  are  identical.  It 
is  possible  therefore  to  have  an  interferer  device  with  higher  transmitted  power  and  lower 
sensitivity  receiver  than  the  other  two  devices.  This  implies  that  the  interferer  may  be  unaware 
of  the  presence  of  the  active  link,  and  as  a  result  continues  to  transmit,  degrading  one  of  the 
established  link  directions.  This  causes  asymmetric  throughput.  The  effect  is  often  called  the 
'deaf-man-shouting'  problem. 

In  this  paper  we  describe  an  analysis  of  the  degradation  of  the  bit  error  rate  (BER)  of  the 
affected  link  direction,  as  a  function  of  the  interferer  parameters  such  as  transmit  power, 
receiver  threshold,  and  spatial  position.  We  present  results  of  the  BER  as  a  function  of  the  2D 
position  of  the  interfering  device.  Characteristics  of  asymmetries  of  this  type  are  presented. 
We  also  present  results  of  the  IrDA  link  access  protocol  (IrLAP)  throughput  degradation  due 
to  the  asymmetry  of  link  BER.  The  results  presented  are  for  a  specific  IR  link  protocol,  the 
IrDA  IrLAP  protocol,  using  a  calculation  of  average  packet  end-to-end  transmission  times, 
and  incorporating  re-transmissions  due  to  link  errors,  [  2], [4].  We  show  that  the  affected  link 
throughput  in  the  case  of 'deaf  man  shouting'  interferer  can  be  seriously  affected.  This  work 
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focuses  on  the  understanding  the  various  types  of  asymmetries,  and  is  important  towards  the 
design  of  robust  future  optical  wireless  links. 

Basic  system,  two  IR  device  model 

The  IR  users  A  and  B  are  linked  and  exchange  data.  Figure  1  illustrates  the  model.  User 
1  has  transmitter  Txl  and  receiver  Rxl  and  user  2  has  Tx2  and  Rx2.  When  A  is  transmitting 
to  B,  the  link  distance  ‘d’  is  related  to  the  other  system  parameters,  forNRZ  data,  by: 


(m  +  l).PTxlAr  cos'”  ff^.cos”  Gb 
4^2e(Pgami  +  P!!x2)p.BW.SNR 


0) 


where  m  and  n  are  the  Txl  and  Rx2  radiation  pattern  lobe  index,  BW  is  the  receiver 
bandwidth  and  Ar  is  the  receiver  area.  We  assume  here  a  normalised  radiation  pattern 
following  shape  as  Cosm0A  and  Cosn0B  respectively. 


Txl  Tx2 

Rxl  Rx2 


Figure  1:  Two  user  IR  link  model  Txl  and  Rx2  are  at  distance  d  apart,  subtended  by  angles  0A,  and  0B 

to  d. 

PTxi ,  Pb  amb  and  P^  are  the  transmitted,  received  and  ambient  optical  powers,  SNR  is  the 
signal  to  noise  ratio,  p  is  the  detector  responsivity,  (in  A/W),  and  *e*  is  the  electronic  charge. 
Equation  l,  clearly  describes  the  relation  between  the  IR  link  distance  and  SNR  which  in  turn 
is  related  to  link  Bit  Error  Rate,  (BER),  for  fixed  ambient  light  noise  and  the  other  parameters. 
The  established  link  between  users  A  and  B  is  symmetric  about  the  connecting  line  AB. 
Assuming  the  transmitting  and  receiving  devices  of  A  and  B  are  the  same,  or  the  transceivers 
have  the  same  parity,  then  the  throughput  is  the  same  in  both  directions  AB  and  BA,  resulting 
in  symmetric  bi-directional  throughput,  provided  the  ambient  light  noise  is  not  asymmetric. 

Modelling  third  user  interference 

In  this  work,  we  assume  that  the  ambient  background  noise  (lamps,  lighting  of  various 
types,  cause  ambient  noise  in  all  devices  present.  We  consider,  in  the  same  way  as  that  shown 
in  Figure  1,  two  users  A  and  B  connected  and  exchanging  data.  For  the  sake  of  clarity  we 
assume  users  A  and  B  are  identical  and  perfectly  aligned.  That  implies  that  the  angles  0A,  and 
0B  are  zero.  This  assumption  implies  that  a  maximum  link  distance  between  A  and  B  is 
possible.  The  link  A  and  B  is  symmetrical.  Extending  the  model,  we  assume  here  the  presence 
of  a  third  user  C,  located  further  from  user  B,  and  invisible  to  user  B,  as  illustrated  in  Figure 
2.  The  receiver  of  user  A  however,  will  be  affected  by  transmissions  from  C.  The  effect  of 
transmissions  from  user  C  therefore  is  expected  to  cause  degradation  of  link  BA,  due  to 
interference  from  C  to  A.  We  assume  C  is  located  anywhere  on  the  plane  of  AB,  pointing 
towards  A.  We  further  assume  for  simplicity  that  6q  ~  0.  This  means  that  C  is  aligned  and 
pointing  towards  A.  This  maybe  a  situation  when  C  attempts  to  connect  to  A  being  unaware 
however  of  the  existing  link  between  BA.  The  link  AB  is  however  unaffected  by  C. 
Asymmetry  in  link  throughput  between  AB  and  BA  would  therefore  occur  due  to 
transmissions  from  C  towards  A. 
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Interfering  User 


Txl  Tx2 

Rxl  Rx2 


Figure  2:  Interference  by  user  C:  User  A  is  linked  to  B.  User  C  interferes  with  user  A. 


Bit  Error  Rate  of  IrDA  links 

We  assume  that  the  transmissions  from  C  degrade  the  eye  diagram  of  the  link  BA  as 
shown  in  Figure  3.  The  magnitude  of  the  interference  from  C  is  shown  by  a  vector  in  the 
opposite  direction  to  the  signal. 


Interference 


Figure  3:  Eye  diagram  at  receiver  of  user  A.  The  interference  from  user  C  causes  eye  closure. 

For  user  A,  the  probability  of  error  Pe  for  the  received  data  from  user  B,  under  the  influence 
of  interference  from  C  is  given  by  Pe  =  Q(^/SNR) .  It  follows  that: 


P,=Q\ 


signal  -  interference 
a 


(2) 


as  illustrated  in  Figure  3,  with  Q  being  the  complementary  error  function,  and  a  is  the  rms 
noise  amplitude  at  the  receiver.  Using  the  geometry  of  Figure  3,  we  can  determine  the  signal 
and  interference  strengths  in  equation  (2). 

PB  .R.A.(m  +  l)[cos"  0A  cos'”  0B  ] 

- 2^ - L  (3) 


and 


interference  = 


Pc  -  R-  Ar  (m  +  l^cos”  (0A  -  6j )  cosm  Gc  ] 
2k  r22 


(4) 


and  Pc  is  the  transmitted  power  from  the  interfering  user  C,  PB  is  the  user  B  transmit  power, 
R  is  the  receiver  responsivity  in  AAV,  m  and  n  are  the  transmitter  and  receiver  lobe  cosine 
values,  and  Ar  is  the  receiver  area  in  m2. 
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Where  RA  is  user  A  receiver  threshold  in  W/m2,  TB„  Tc  are  the  transmit  intensity  of  user  B 
and  interferer  user  C  respectively  in  W/Sr. 


Results 

In  order  to  demonstrate  the  effect  of  the  interfering  user  C  on  the  link  BA  error  rate 
quality,  we  use  the  above  principles  on  IrDA  optical  links.  We  combine  the  derived 
probability  of  errors  using  the  model  described  here,  with  a  model  of  the  IrLAP  (link  layer 
protocol)  to  produce  normalised  throughput  results.  We  have  modeled  here  both  the  physical 
and  link  layer  of  IrDA.  As  outlined  earlier,  user  C  degrades  the  quality  of  link  BA.  As  C 
approaches  A,  the  degradation  worsens,  until  C  senses  that  A  is  actively  linked  with  B.  This 
happens  at  a  distance  called  'carrier  sense'  (CS).  Carrier  sense  distance  is  the  minimum 
distance  from  A  necessary  for  C  to  sense  transmissions  from  A.  If  CS  is  long  enough,  it  stops 
C  early  from  degrading  significantly  BA.  However,  we  show  here  that  if  user  C  transmits 
with  intensity  more  than  80  mW/Sr  which  is  within  the  lower  of  the  specified  transmit 
intensity  limits  of  the  IrDA  standard,  it  is  possible  for  C  to  destroy  the  quality  of  link  BA 
before  it  senses  the  presence  of  activity.  The  carrier  sense  distance  [3]  for  a  typical  IrDA  link 
is  taken  here  to  be  2.3  meters. 

The  results  of  Figure  4  were  derived  with  the  assumptions  that  Users  A  and  B  are 
aligned  and  the  interfering  user  C  is  on  the  same  line  as  AB  and  also  aligned  and  aimed  at 
user  A.  This  translates  to  0A  =  9b  ~  &c=  9i  “  0.  We  can  observe  that  as  the  interfering  user's 
transmitted  power  increases  the  throughput  of  BA  link  deteriorates.  This  is  understood  from 
the  fact  that  the  interference  level  increases  and  the  bit  error  rate  of  link  BA  increases.  The 
results  of  Figure  4  have  been  derived  for  the  IrLAP  protocol  of  IrDA,  [4].  User  A  is 
transmitting  to  user  B  with  40  mW/Sr  intensity.  When  the  intensity  of  the  transmission  from 
C  has  reached  80mW/Sr,  the  throughput  is  less  than  0.2  before  CS  can  detect  activity  by  user 
A.  As  the  power  increases  even  more,  then  the  link  throughput  will  be  zero.  This  indicates 
that  carrier  sense  is  not  sufficient  in  protecting  an  existing  link.  IrDA  versions  1  .x  do  not  have 
a  carrier  sense  provision.  We  can  determine  from  Figure  4  that  a  user  C  identical  to  user  A 
and  B,  (40  mW/Sr),  if  it  approaches  user  A  to  less  than  2  m,  will  noticeably  reduce  its 
throughput  unless  carrier  sense  is  active.  At  1.5  m  distance  it  would  deteriorate  the  throughput 
to  zero.  Manufacturing  component  tolerances  and  the  standard  transmitter  band  limits  are 
wide  enough  to  allow  an  interfering  user  to  deteriorate  or  even  destroy  the  quality  and 
throughput  of  an  IR  link. 
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Figure  4:  IrLAP  throughput  of  link  BA  when  an  interfering  user  C  is  along  the  axis  AB. 


Figure  5:  Darkened  area  represents  the  location  of  interferer  C  relative  to  user  A.  user  A  is  located  at 
the  center  of  the  polar  plot,  and  user  B  is  1  m  away.  The  darkened  area  represents  the  BA  link 
throughput  of  better  than  0.9.  The  inner  contours  represent  throughput=0.5,  and  0.2.  The  carrier  sense 
distance  is  the  contour  at  2.3  m  along  the  zero  degree  axis.  Tc  ~  60  mW/Sr  and  TB  =  40mW/Sr. 

In  deriving  the  results  of  Figure  5  and  6,  user  C  was  allowed  to  move  freely  on  a  2D  plane 
containing  A  and  B.  If  there  is  a  carrier  sense  operation  active,  as  shown  in  Figure  5,  (first 
contour  inside  from  the  outer)  it  is  sufficient  to  prevent  throughput  loss,  since  it  is  active  just 
as  the  throughput  of  BA  is  about  to  drop.  Finally,  when  the  interfering  user,  C,  transmit 
intensity  is  increased  to  70  mW/Sr,  still  well  within  the  IrDA  limits,  Figure  6  shows  that  the 
throughput  of  link  BA  is  zero  at  a  distance  approximately  2  m  along  0=0,  without  carrier 
sense.  The  chosen  carrier  sense  distance  of  2.3  m  is  not  adequate  to  protect  the  link  BA  when 
the  intensity  of  C  is  increased  to  90mW/Sr. 
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Figure  6:  Tc  =  90m W  /Sr,  TB  ~  40mW/Sr:  Carrier  sense  distance  at  2.3  m  along  0=  0  ,  is  not  adequate 
to  prevent  the  throughput  of  link  BA  to  be  zero. 


Conclusion 

A  model  has  been  developed  to  describe  the  effect  of  third  user  interference  on  an  active 
IR  link.  The  model  allows  the  position  of  the  third  user  interferer  to  vary  in  2D  space.  The 
results  of  the  analysis  for  an  IrDA  link,  indicate  that  it  is  possible  for  the  interferer  to  shadow 
the  existing  link.  The  throughput  can  be  degraded  to  zero,  provided  the  intensity  of  the  third 
user  transmitter  is  as  low  as  80  mW/Sr,  which  is  within  the  lower  limit  of  an  IrDA  transmitter. 
This  demonstrates  the  importance  of  carrier  sense  as  a  means  of  deterring  third  users 
transmitting  within  the  range  of  an  existing  IrDA  link.  There  is  no  provision  in  the  IrDA  1.x 
standard  for  this  kind  of  interference.  However,  for  the  AIR  standard  this  is  being  recognized 
and  carrier  sense  is  part  of  the  operation  of  the  link. 
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Abstract.  The  code  division  multiple  access  (CDMA)  is  under  consideration  for 
broadband  video  delivery  over  fiber-optic  local  area  networks  (LANs).  The  basic 
principles  of  CDMA  will  be  presented,  as  well  as  platform  for  dispersion 
measurement  of  Optical  CDMA  coder  and  measurement  results. 


I.  Introduction 

Broadband  video  has  been  carried  out  basically  using  microwave  subcarrier 
multiplexing  employing  analog  modulation  schemes  (VSB-AM,  FM)  due  to  early  technical 
implementations  producing  a  single  analog  lightwave  containing  the  multiplexed  video 
channels  [1].  A  digital  transmission  system  promises  progress  due  to  developed  high-speed 
fiber  optics  digital  transmission  technology  and  digital  transmission  is  a  better  match  to  the 
fiber  because  of  the  fact  that  above  the  power  level  of  10  mW  fiber  becomes  nonlinear  in  its 
transmission  characteristics.  The  result  is  that  fiber  systems  have  up  to  10  000  times  (40  dB) 
less  dynamic  range  then  wireline  and  microwave  systems.  This  is  compensated  by  a  100  time 
lower  loss  in  fiber  and  by  using  binary  digital  signaling  rather  than  multilevel  or  analog 
modulation.  Multilevel  modulation  can  achieve  greater  spectrum  efficiency  (>  lb/s/Hz)  which 
is  necessary  for  video  transmission  in  narrowband  systems  [2]  and  offers  a  possibility  of 
better  handling  a  number  of  parallel  signals,  but  requirements  for  the  linearity  of  optical 
sources  are  greater  and  specific  distributed  feedback  (DFB)  lasers  must  be  employed. 

A  promising  alternative  is  code  division  multiple  access  (CDMA),  in  which  data  of  a 
video  source  are  coded  into  uniquely  addressed  lightwave  pulse  sequences  that  can  be 
recognized  and  separated  at  the  receivers  [3].  Recent  developments  in  fiber  optics 
communications  such  as  in  fiber  amplifiers  have  removed  the  traditional  limits  for  system 
performance  which  were  based  on  attenuation,  as  a  result  pulse  dispersion  has  been  left  as  a 
primary  limiting  factor  and  major  concern  in  very  high  bit  rate  optical  systems  [ 6 ].  Dispersion 
is  found  to  be  particulary  destructive  in  optical  CDMA  systems  which  require  narrow  pulses 
on  the  order  of  picoseconds  codes  [4]. 

II.  Optical  CDMA 

CDMA  is  a  method  that  allows  several  users  to  share  the  entire  bandwidth  of  the 
channel.  Optical  CDMA  is  accomplished  by  assigning  each  user  a  unique  code  usually 
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consisting  of  a  series  of  “chips”  or  narrow  pulses.  The  codes  are  designed  so  that  there  is 
minimal  cross  correlation  between  any  two  codes  as  well  as  minimal  correlation  between  each 
code  and  any  shifted  or  delayed  version  of  itself  so  they  produce  high  periodic 
crosscorrelations  when  properly  matched  and  low  crosscorrelation  with  all  other  sequences. 
Let  (X)  =  (x0,  x„  xn)  and  (Y)  =  (y0,  y„  yn)  be  two  (0,1)  sequences,  the  crosscorelation 
is: 

N-l 

©xY(k)=Zxiyi+k  W 

i=0 

and  with  complement  of  sequence  (X)  define  by  (X)  whose  elements  are  1-x,  we  look  for 
sequences  for  which: 

N-l 

©XyM”  S^iyi+k  =<9XY(k) 

i=0 

A  receiver  that  computes  ©XY(k)  -  ©xY(k)  will  reject  the  interference  coming  from  user 
having  sequence  (Y). 

The  CDMA  system  has  the  advantage  of  relatively  simple  laser  sources  (without  need 
of  wavelength  control),  standard  photodetectors  (without  narrow  optical  filters),  and  improved 
power  levels  due  to  the  laser  pulsing.  Matching  of  the  code  address  to  correlation,  which  can 
be  achieved  both  optically  and  passively,  provides  the  major  advantage  of  this  system. 

III.  Dispersion  Measurement  of  CDMA  Coder 

Dispersion  is  the  broadening  of  pulses  as  they  propagate  through  a  given  medium.  If 
the  transmitted  pulse  spreads  too  much,  then  it  may  be  virtually  undetectable  as  well  as 
interfere  with  neighbouring  pulses,  thereby  making  it  inseparable  (closes  the  “eye”)  and  the 
overall  signal  unrecoverable.  In  a  fiber-optics  system  there  are  three  types  of  dispersion: 
modal,  material,  and  waveguide.  SMFs  eliminate  modal  dispersion  yet  propagating  pulses  still 
suffer  from  material  and  waveguide  dispersion  jointly  called  chromatic  dispersion. 

For  chromatic  dispersion  measurement  we  use  a  unique  variant  of  the  so-called 
“differential  phase”  method  [5].  In  a  single  mode  optical  fiber  the  light  travels  with  a 
propagation  delay  t(A)  which  is  proportional  to  the  fiber  length  L  and  is  a  function  of 
wavelength.  Chromatic  dispersion  D(A,)  is  derivative  of  the  delay  t(A,)  with  respect  to 
wavelength  X,  and  is  a  measure  of  the  light  pulse  spreading  observed  in  a  fiber  of  given  length 
and  for  a  given  light  spectral  width.  This  derivative  is  very  closely  approximated  by  the 
differential  delay  At  between  two  wavelength  points  separated  by  AX  if  A,  is  not  too  large: 

D(X)*dt(X)/dX*At/AX  (3) 

This  approximation  is  valid  for  all  fibers  and  makes  no  assumption  about  the  spectral 
shape  of  the  delay  curve,  other  than  that  the  dispersion  is  slowly  varying  and  has  no 
discontinuities.  The  principle  of  the  differential  phase  method  is  therefore  to  measure  the 
differential  fiber  group  delay  At  for  two  wavelengths  separated  by  AX  and  to  obtain  a  direct 
dispersion  measurement  without  data  processing. 
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Measurement  of  delay  t(A.)  is  accomplished  by  measuring  the  phase  shift  $  imparted 
by  the  fiber  delay  upon  the  sinusoidal  intensity  modulation  of  a  light  source  of  a  specific 
wavelength  X  injected  into  the  fiber: 

t(X)  =  <j>  /  2jtf  (4) 

where  f  is  the  frequency  of  modulation. 

Differential  delay  At  between  wavelengths  can  thus  be  measured  by  detecting  the 
differential  phase  shift  A<|).  With  method  called  double  demodulation  A<()  is  detected.  In  this 
method  a  wavelength  modulation  is  superimposed  on  the  sinusoidal  intensity  modulation.  The 
wavelength  switches  from  A,,  to  X2  at  several  hundred  Hz.  High  frequency  phase  detection 
system  demodulates  the  intensity  modulation  using  a  reference  signal  derived  from  the  master 
oscillator  and  detects  in  turn  the  phases  4>j  and  (j>2.  The  wavelength  modulation  signal  gives 
rise  to  a  synchronous  square  wave  from  the  phase  meter  whose  a.c.  amplitude  represents  the 
differential  phase,  and  the  d.c.  level  absolute  group  delay  at  the  mean  wavelength. 

In  experiment  we  used  a  code  set  of  four  sequences,  weight  w=4,  and  length  L=25 
bits.  All  fibers  including  the  delay  lines  was  9/125  standard  single  mode  fibers,  passive  hard- 
wired  fiber-optic  parallel  delay  line  combiners  requiring  delays  of  1-10  ns  (integrated  circuit 
packages  in  future).  In  one  encoder  laser  pulse  is  split  into  four  separate  delay  lines  as 
shown  in  Fig.  1.  (a)  and  we  have  the  OTDR  readout  from  coder  Fig.  1.  (b). 


i 

02468  10 

Enkoder  1  TIME  {ns) 

(1,9,1 9,25)=  1 0000000 1 000000000 1000001 

(a)  (b) 

Fig.  1  (a)  CDMA  coder  (b)  OTDR  readout  from  coder 

In  double  detection  method  of  dispersion  measurement  we  used  wide  input  signal  from 
LED  diode  because  the  width  of  optical  source  has  no  impact  on  measurement.  The  results 
given  in  Tab  I.  show  that  CDMA  coder  has  very  little  impact  on  overall  dispersion  in  SMF, 
but  there  is  more  noticeable  impact  at  zero-dispersion  wavelength  (1300  nm)  than  at  1550  nm 
wavelength. 
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Tab  I.  CDMA  coder  dispersion  measurement 


1550 

nm 

1300 

nm 

Fiber  length 

D  [ps/  nm/km] 
without  coder 

D  [ps/ nm/km] 
with  coder 

D  [ps/ nm/km] 
without  coder 

D  [ps/ nm/km] 
with  coder 

1  km 

15.181 

15.686 

-1.705 

WMmmm 

5  km 

17.004 

16.970 

-0.98 

15  km 

16.678 

17.008 

-1.113 

25  km 

16.747 

17.214 

-1.095 

IV.  Conclusion 

This  paper  presents  dispersion  impact  on  an  optical  code  division  multiple-access 
(CDMA)  coder  using  a  single-mode  fiber.  A  great  deal  of  work  has  focused  on  reducing 
dispersion  in  general  single-user  optical  communications  systems.  Both  electronical  and 
optical  processing  techniques  have  been  explored.  Electronical  processing  is,  in  general,  too 
slow  for  the  very  small  pulse  widths  involved  in  CDMA  systems.  Optical  processing  is 
powerful  for  single  user  communications  but  implementation  on  fiber  that  has  already  been 
instaled  is  difficult,  and  it  lacks  flexibility  in  multi  user  situations  when  fiber  lengths  for 
different  users  may  be  different.  Dispersion  can  lead  to  significant  intersymbol  and  interchip 
interference,  leading  to  unacceptable  levels  of  multiuser  interference.  We  can  derive  from 
dispersion  measurement  of  one  CDMA  coder  that  it  doesn’t  noticeably  extend  chromatic 
dispersion  which  will  impact  on  CDMA  system  performance.  Measurement  values  are  still  in 
range  of  ITU-T  G.  652  recommendation  which  defines  single  mode  fiber  characteristics. 
Although  in  double  detection  method  for  chromatic  dispersion  measurement  we  cannot 
change  the  signal  rate  and  thus  measure  dispersion  impact  on  BER  characteristic  which 
together  with  changing  the  CDMA  codes  will  give  the  complete  dispersion  influence  results. 
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Abstract — This  paper  presents  the  analysis  of  the  potential  of  an  code-division 
multiple-access  technique  for  optical  communication  systems.  The  main  system 
characteristics  are  discussed  and  comparison  to  existing  technology  is  given. 


I.  Introduction 

Spread  spectrum  communication  technique  is  subject  of  the  intensive  research  in 
the  context  of  mobile  and  satellite  communication  and  it  offers  many  advantages  for 
optical  systems  as  well.  Code  division  multiple  access  scheme  (CDMA)  uses  the  prin¬ 
ciples  of  spread  spectrum  systems  to  combine  channels  of  individual  users  with  high 
efficiency.  CDMA  is  based  on  the  assigning  orthogonal  codes  to  users  what  results 
in  a  substantial  increase  of  the  signal  bandwidth.  Therefore  CDMA  requires  broad¬ 
band  communication  channel  as  a  optic  fiber.  Moreover,  CDMA  requires  broadband 
signal  processing  in  the  receiver.  Currently  developed  optical  components  will  offer 
much  higher  bandwidth  than  their  electronic  equivalents  and  we  might  expect  that 
future  optical  networks  will  perform  most  of  the  signal  processing  in  the  optical 
domain  what  allows  to  fully  exploit  advantages  of  CDMA. 

II.  System  components 

Optical  CDMA  [1]  exploits  wide  bandwidth  of  monomode  fiber.  It  converts  electri¬ 
cal  signal  of  a  lower  rate  to  the  high-rate  sequence  of  the  optical  pulses  providing 
an  asynchronous  access  to  shared  communication  channel  without  additional  con¬ 
trol.  Data  signal  from  information  source  is  usually  represented  by  on-off  signaling 
format,  where  the  optical  pulse  symbolizes  data  bit  1.  Output  of  the  laser  is  then 
encoded  in  the  CDMA  encoder,  which  converts  every  pulse  to  the  sequence  of  pulses. 
This  sequence  can  be  represented  by  the  unipolar  code,  consisting  of  L  chips.  At 
the  receiving  end  a  correlation  process  is  used  to  extract  the  desired  data  from  the 
received  signal.  The  encoder  as  well  as  the  correlation  receiver  can  be  implemented 
in  optically  what  allows  to  achieve  high  transmission  rates.  Since  the  matched  filter 
is  used  in  the  receiver,  the  local  synchronized  generation  of  the  spreading  code  is 
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splitter  2x2  switches  combiner 


Figure  1:  All-serial  architecture  of  T  optical  encoder. 


not  needed  for  the  correlation.  Optical  CDMA  networks  then  consist  of  N  pairs 
transmitter-receiver,  where  assigned  codes  become  network  addresses.  Receiver 
must  be  able  to  recognize  the  desired  signal  in  the  composite  received  signal  con¬ 
taining  information  of  all  active  network  users.  Therefore  the  design  of  the  unipolar 
codes  with  good  and  cross-correlation  properties  is  necessary.  Another  requirement 
posed  on  optical  CDMA  codes  is  the  good  autocorrelation  function  with  the  low 
sidelobes.  This  allows  to  easily  distinguish  the  autocorrelation  peak  corresponding 
to  the  end  of  a  bit  interval.  Therefore  optical  CDMA  with  the  tapped  lines  matched 
filter  and  threshold  detector  does  not  require  any  synchronization. 

There  are  proposed  several  classes  of  the  codes  for  incohererent  CDMA  system. 
2n  codes  [2], [3]  are  suitable  candidates  since  their  encoder  and  decoder  architecture 
provides  substantially  lower  power  loss  and  lower  cost  in  comparison  to  the  conven¬ 
tional  parallel  architecture.  With  2n  codes  an  all-serial  architecture  (Figure  1)  is 
used  for  code  sequence  generation,  selection  and  correlation.  In  general,  the  encoder 
is  constructed  from  a  number  of  switching  stages,  each  with  a  2  x  2  switch  and  delay 
line.  Each  2x2  switch  is  configured  into  two  possible  (mix-split  or  straight-through) 
states  according  to  its  DC  bias  voltage,  controlled  by  the  electronic  address  selector. 
Mix-split  state  allows  optical  pulses  to  mix  and  split  inside  a  2  x  2  switch,  while  the 
straight-through  state  allows  optical  pulses  to  pass  through  without  changes.  The 
decoder  consists  of  2  x  2  passive  couplers  with  suitable  optical  delays,  similar  to  the 
coder  in  Figure  1. 

III.  Networking 

Optical  CDMA  offers  a  cost-effective  method  to  transport,  provision  and  protection 
of  standard  protocols  as  ATM  or  Ethernet.  CDMA  meets  all  basic  requirements  for 
an  all-optical  network.  Due  to  its  coding  technique  and  the  all-optical  signal  pro¬ 
cessing  CDMA  ensures  an  effective  access  and  utilization  of  a  large  portion  of  the 
fiber  spectrum.  Because  of  the  broadcast  nature  of  CDMA,  all  information  signals 
are  available  anywhere  in  the  network.  Placing  the  receiver  with  decoder  matched 
to  the  desired  code  in  any  location  within  the  network  we  are  able  to  extract  needed 
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Figure  2:  SDH  and  CDMA  optical  network  infrastructure  [4]. 


data  and  all  other  information  will  be  rejected.  It  should  be  noted  that  CDMA 
as  a  spread  spectrum  technology  provides  secure  communication.  Programmable 
transmitter  encoder  (or  receiver  decoder)  allows  to  direct  the  information  to  any 
destination  within  the  network.  This  feature  allows  to  reduce  the  size  of  the  net¬ 
work  cross-connect  equipment  and  eliminate  manual  configuration  hassle  for  network 
operator.  The  network  protection  can  be  achieved  by  the  transmitting  information 
using  two  code  sequences  around  the  ring  in  opposite  directions.  This  provides  1+1 
protection,  where  receiver  can  select  one  of  two  received  signals. 

Figure  2  compares  the  network  infrastructure  of  the  typical  SDH  network  and  its 
CDMA  equivalent.  It  can  be  seen  that  CDMA  requires  fewer  pieces  of  equipment, 
saving  on  total  cost  and  decreasing  the  network  complexity  [4]. 

IV.  System  performance 

The  use  of  unipolar  {0,  +1}  pseudo-orthogonal  codes  with  non-zero  cross-correlation 
function  in  incoherent  optical  systems  brings  the  problem  of  the  multiple-access 
interference  cancellation,  since  a  simple  correlation  receiver  (matched  filter)  ignores 
the  cross-correlation  between  the  modulating  signals  of  different  users  and  therefore 
it  has  poor  performance  in  the  system  composed  of  many  users.  Optimum  multiuser 
detection  achieves  important  performance  gains  over  the  conventional  single-user 
detection  at  the  expenses  of  computational  complexity  that  grows  exponentially 
with  the  number  of  users  [5].  There  were  analyzed  also  the  linear  and  nonlinear 
multiuser  detectors,  which  approach  the  performance  of  the  optimum  multiuser 
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detector  [6],  [7].  These  detectors  can  unfortunately  not  be  implemented  optically. 
Another  possible  way  of  multiple-access  interference  cancellation  is  the  use  of  an 
passive  optical  hard-limiter  in  the  optical  correlation  decoder  [8].  Optical  hard- 
limiter  allows  to  apply  all-optical  signal  processing  and  therefore  does  not  limit  the 
transmission  rate. 

With  ultra-short  optical  pulses  of  l-3ps  and  code  sequences  length  of  thousands 
chips  CDMA  can  support  user  data  rates  of  hundreds  Mbit/sec.  However,  the 
multiple  access  interference  degrades  the  system  performance  and  therefore  it  limits 
the  total  number  of  users  accommodated  by  CDMA  network. 
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Abstract.  Harmonic  linearization  is  standard  method  for  modeling  of  nar¬ 
row-band  systems.  In  case  of  very  high-Q  resonators  using  of  only  one  har¬ 
monic  as  input  signal  of  nonlinear  element(s)  may  be  sufficient.  To  discover 
higher  order  effects,  more  complicated  models  must  be  applied.  This  paper 
deals  with  the  method  that  uses  large  signal  models  based  on  virtual  pure 
sine  wave.  This  allows  to  split  the  model  into  two  parts,  one  is  specific  for 
non-linearity  and  another  is  universal  waveform  transformation 


I.  Introduction 

Oscillator  circuits  exhibit  behavior  that  can  be  characterized  as  "near  harmonic".  That  means 
the  possibility  to  represent  signals  roughly  as 

w(r)=  A(f)cos(oy  +  <p(t))  (1) 

where  A(t)  and  (p(t )  are  slow  if  compared  with  phase  change  caused  by  the  term  0)0t.  The  term 
"slow"  is  fuzzy  and  during  transients,  those  "slow  changes"  may  be  comparable  with  "fast" 
ones.  Separation  of  those  two  components  is  easy  for  long  period  samples  of  the  signal  and  in 
steady  state.  Practical  oscillator  design  needs  very  accurate  models  for  correct  determination 
of  oscillation  frequency,  phase  noise,  and  reactions  to  external  excitations  and  control.  Tran¬ 
sient  times  become  important  for  example  in  mobile  communications.  In  short  time  processes, 
strict  distinguishing  of  spectral  bands  may  appear  difficult,  at  the  same  time  direct  time- 
domain  simulation  is  too  time  consumable  and  not  enough  accurate. 

It  is  known  that  even  simple  oscillator  structures  (as  Colpitts  oscillator)  may  exhibit  chaotic 
behavior  that  may  appear  through  period-doubling  process  [1,5].  At  the  same  time,  oscillator 
circuits  are  usually  simple,  consisting  of  only  several  components.  It  follows  that  modeling  of 
such  circuits  could  be  implemented  using  large  equivalent  circuits  or  systems  of  equations. 

It  has  been  demonstrated  that  simple  equivalent  circuits  can  be  built  for  large-signal  models 
that  are  based  on  harmonic  linearization  using  fundamental  frequency  only  [2]  and  modeling 
of  transients  can  sometimes  be  accelerated  more  than  1000  times.  However,  frequency  control 
mechanism,  implemented  as  a  simple  feedback  system  trying  to  minimize  one  of  quadratic 
components  [2,3],  causes  sometimes  fast  lime  step  decreasing  or  locking  to  double  frequency, 
both  slowing  down  simulation  process.  Not  all  reasons  of  that  behavior  are  clear  but  one  of 
them  is  obviously  non-deterministic  signal  representation  (1),  and  probably,  too  rough  model 
based  on  fundamental  frequency  only.  The  consequence  is  that  we  have  to  extend  such  mod¬ 
els  including  higher  harmonics. 
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II.  Basic  ideas 

Large  signal  models  are  based  on  the  following  construction:  defining  suitable  signal  space 
(e.g.  narrow-band  signals).  We  build  the  models,  which  operate  with  signal  parameters,  not 
their  instant  values.  Assume  the  signal 

u(/)=a0  +  a1cosf  +  a2cos2f  +  o3cos3/  +  a4cos4/  +  ...  (2) 


to  be  represented  as 


m(0==4  +  AcosW)) 

in  some  finite  interval  of  t.  Considering  p  as  a  new  independent  variable  (monotonic  phase), 
we  can  compute  the  result  of  static  non-linear  transformation  y  =  f{u )  in  the  ^-domain.  So, 
the  signal  space  is  defined  by  two  parameters  Aq  and  Applying  A0  +  A,  cos  p  to  the  input, 


we  obtain  output 

y(p)=fl0  +  Bx  cos  p  +  B2cos2p  +  B3cos3p  +  ...  (4) 

using  pre-calculated  formulas  for  /(«).  Now  the  how  to  return  to  ^-domain.  This  procedure  is 
universal,  not  depending  on  the  /(w),  and  so  we  can  avoid  building  of  very  complicated 
multi-tone  models.  From  (3)  we  obtain 

cos  p  =  («(/)—  Aq  )/  Ax .  (5) 

From  this,  we  calculate  cosp, cos2p, cos 3/?,...  to  substitute  into  (4).  The  problem  is  that 
function  p{t)  may  be  complicated.  Fortunately,  we  do  not  need  this  function  directly  as 
cos  np  can  be  expressed  in  the  following  form: 

cos  tip  —  cos(/i  arccos(w))  (6) 


where 


w(f )  =  A-1  (<20  -  Aq  +  ax  cos  t  +  a2  cos  It  +  a3  cos  3t  + . . .) 

Expression  (6)  is  Chebyshev  polynomial 

cosk/7  =  7;((iv), 

and  for  truncated  series  (7),  cos  np  can  be  computed  by  finite  number  of  operations: 


T,JW)=2wT„(w)-Tjw) 


(7) 

(8) 
(9) 


This  together  with  (7)  defines  universal  waveform  transformation. 

Notes:  1)  as  seen  from  (9),  |cos  p\  needs  not  be  limited  by  1.0; 

2)  described  model  can  be  applied  to  any  time  domain  segment  -  it  may  be  exactly  one 
period  of  the  signal,  pail  of  it,  or  several  periods. 


in.  Implementation 


A .  Choice  of  Ao  and  A  j 

We  are  quite  free  in  this  choice,  for  example,  we  can  set  Ao=ao  that  simplifies  implementa¬ 
tion.  Also  we  can  set  A;  to  any  positive  value.  The  only  criterion  for  those  choices  is  mini¬ 
mizing  the  components  that  are  to  be  considered  as  non-zeroes.  Optimal  solution  is  obviously 

maxu{t)+m\nu(t)  _  maxa(Q- roinu(r) 

2  ’  1  2 

However,  detection  of  extreme  values  may  be  inconvenient  and  therefore,  we  can  use  evalua¬ 
tions  based  on  mis  values.  The  best  practical  solution  tested  is  the  following.  Having  from 
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previous  step  a  waveform  w  and  its  Chebyshev  transformation,  we  solve  equations  to  find  Ao 
and  A;  using  first  components  of  waveform  only.  This  appeared  to  be  fast  and  productive. 


B.  Waveform  representation  and  Chebyshev  transformation 


For  analysis  of  transients,  we  cannot  use  standard  Fourier  series,  as  this  would  create  equal 
values  at  both  ends  of  period.  In  general,  they  may  be  different  as  seen  from  integration  of 
constant  signal: 

v(/)=  v(o)+  C~l\iQdx  -  v(o)+  C~\t .  (11) 

o 


However,  this  function  can  be  represented  using  half- frequency  component  cos^.  Finally, 
we  obtain  general  form  of  signals  to  be  used  as 


u(t)~aQ  +ay  cos-^-af/  sin~  +  Y,{ak  cos t-alk  sin/). 


Vi  i  t 


(12) 


This  form  consists  of  harmonic  components  only  and  therefore,  processing  is  rather  simple 
both  in  linear  and  non-linear  operations.  Description  of  integration  formulas  is  partly  given  in 
[4].  Those  and  details  of  Chebyshev  transformation  will  be  omitted  here. 


C.  Frequency  control 

Described  method  is  usable  for  any  time  step.  However,  as  the  target  is  narrow-band  system, 
we  try  to  determine  fundamental  period  T  (or  integer  multiple  of  it)  during  simulation. 
Considering  T  as  a  variable,  we  add  one  equation  ,  eg  demanding  that  values  of  some  variable 
x (t)  at  the  ends  of  period  should  be  equal:  *(o)= *(r).  This  is  equivalent  to  forcing  at  =0  for 
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that  variable.  If  initial  value  of  x(t)  was  zero,  we  shall  follow  zero  crossings  of  that  variable. 


IV.  Examples. 

D.  Example  1. 

Consider  simple  circuit  shown  on  Figure  1  [4]. 

Iterations  were  implemented  in  two  sections  as 
decribed  above:  first  section  to  determine  Ao  and  A;, 
and  the  second  one  to  fix  waveform.  Period  was 
multiplied  when  half-frequency  components  became 
very  small,  and  so  fast  progress  in  simulation  was 
achieved  when  reaching  steady  state.  When  modeling 
uses  lsl  harmonic  only,  steady  state  period  is  equal  to 
resonance  frequency  of  LC-tank,  ie  2k.  Using  our 
model  the  value  T=6.29941436  was  obtained  that  makes  difference  of  0.26%  or  2600ppm. 
This  value  of  period  was  confirmed  by  special  independent  checking  procedure. 


Figure  1.  Simple  oscillator  circuit 


E.  Example  2 

In  this  example,  we  consider  Colpitts  oscillator  described  in  [5]  that  is  able  to  exhibit  chaos. 
The  purpose  was  to  test  the  method  in  case  of  the  circuit  that  probably  does  not  have  fixed 
period.  This  oscillator  with  simplified  BJT  model  (piecewise-linear  approximation  as  in  [5]) 
was  modeled  by  Spice  and  our  method  using  constant  time  step.  We  chose  integration  period 
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1/is  and  used  up  to  9  harmonics  and  9  terms  in  Chebyshev  series.  The  result  was  compared 
with  that  obtained  from  PSpice  using  two  sets  of  control  parameters:  1)  reltol=le-5  and  time 
step  limit  O.lpis  (10  steps  per  our  step),  and  2)  reltol=le-6  and  step  limit  20ns .  All  three  cases 
gave  almost  the  same  result  in  time  segment  [0...120fis] .  After  that,  the  results  became  differ¬ 
ent  (Figure  2):  "Spice"  simulation  shows  very  different  behavior.  Higher  accuracy  "SpiceHA" 
is  much  closer  to  our  result,  however  starts  also  to  decline  after  I50ps .  We  do  not  know 
which  is  the  exact  solution  but  this  example  demonstrates  that  our  method  can  be  used  to  im¬ 
plement  numerical  integration  in  general  case  using  large  time  steps. 


Figure  2.  Colpitts  chaotic  oscillator:  comparison  of  three  simulations 


V.  Conclusion 

Harmonic  linearization  model  can  be  built  using  only  two-dimensional  signal  space  based  on 
the  use  of  monotonic  phase  signal  representation.  The  method  also  allows  performing  numeri¬ 
cal  integration  with  large  time  step. 
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Abstract.  An  alternative  approach  to  analog  circuit  design  and  analysis 
problems  is  presented.  The  proposed  technique  bases  on  a  small-signal 
rational  circuit  function  of  complex  variable  with  real  numerical  coeffi¬ 
cients.  The  main  idea  is  to  build  equivalent  circuits  for  a  given  circuit 
function  under  control  of  the  human  designer.  The  architecture  of  Synthesis 
Calculator  is  presented  along  with  data  structures  specially  developed 
accounting  for  the  needs  of  the  interactive  synthesis. 

Keywords:  analog  circuits,  interactive  computing,  circuit  functions,  circuit 
synthesis,  object  recognition,  object  oriented  programming. 

I.  Introduction 

A  quite  common  problem  in  engineering  practice  is  that  despite  all  efforts  the  circuit 
fails  to  behave  as  expected.  Then  one  may  turn  to  symbolic  analysis  [6],  [8],  which  is  very 
error-prone  and  produces  huge  expressions.  There  exist  however  several  methods  for 
symbolic  expression  simplification  [7]. 

The  work  with  small-signal  numerical  circuit  function(s)  of  complex  frequency  is  much 
easier.  The  objective  here  is  to  synthesise  equivalent  circuits  for  the  circuit  under  inves¬ 
tigation.  This  calls  for  an  interactive  system  for  handling  circuit  functions,  specially  adapted 
for  performing  the  classical  synthesis  steps  on  the  circuit  functions.  The  system  must  also  be 
able  to  collect,  interpret  and  present  the  circuit  data  obtained  from  the  synthesis  actions. 

In  Section  II,  a  motivation  example  for  the  Interactive  Synthesis  is  presented.  In  Section 
III,  principles  of  interactive  synthesis  are  proposed.  In  Section  IV,  Synthesis  Object  is 
identified  along  with  its  internal  structure.  Next,  in  Vth  Section,  the  Synthesis  Graph  and  its 
components  are  defined.  In  Section  VI,  the  Synthesis  Calculator  is  introduced,  and  the 
conclusions  are  drawn  in  Section  VII. 

II.  Motivation  example 

A  motivation  example  from  [5]  is  adapted.  The  emulation  of  immittance  using  immit- 
tance  converters  is  very  widespread  in  the  filter  design.  Most  problems  raise  when  emulating 
an  inductor  -  the  bandwidth  is  narrower  than  that  of  an  emulated  capacitor.  The  principle  of 
immittance  emulation  is  shown  in  Figure  1. 

The  idea  is  to  state,  that  the  admittance  Tin,  seen  from  the  input,  is  the  admittance  of  an 
ideal  inductor.  The  problem  is  then  to  synthesise  the  Fload  so  that  this  condition  would  be 
satisfied.  However,  one  is  interested  of  monitoring  the  behaviour  of  F^  instead  of  FLoad>  for 
example  for  determining  whether  the  Tin  is  already  acceptable  and  the  synthesis  of  Flo  ad 
would  be  terminated  prematurely.  This  monitoring  will  call  for  a  circuit  analysis  program,  so 
that  the  IS  tool  would  be  integrated  with  an  analysis  tool. 

III.  Interactive  Synthesis 

The  Interactive  Synthesis  (IS)  has  also  been  under  investigation  in  earlier  papers  [2], 
T31,  [81  and  a  Master  Thesis  f  11.  In  this  paper,  the  basics  for  development  of  the  interactive 
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synthesis  tool  are  given.  The  interactivity  implies  a  decision  from  the  designer’s  side  at  each 
IS  step. 


Figure  1.  Immittance  emulation  example 

The  parameters,  results  and  other  related  data  is  saved  to  a  Synthesis  Object  (SO)  at 
each  IS  step  and  the  SO’s  are  linked  together  to  form  an  Synthesis  Graph  (SG).  The  IS  data 
can  be  extracted  from  the  vertices  of  the  SG  and  presented  in  various  forms  -  graphical  plots, 
circuit  diagrams,  circuit  netlists  etc. 

The  principles  of  analog  circuit  IS  are  as  follows: 

1.  Every  calculation  or  synthesis  step  is  automated  and  initiated  by  the  designer; 

2.  The  initial  data  and  results  of  every  IS  step  are  collected  automatically  to  a  SG; 

3.  The  designer  may  traverse  the  Synthesis  Graph  in  arbitrary  manner; 

4.  The  designer  may  grow  or  reduce  the  SG  from  every  vertex; 

5.  The  complete  set  of  data  representation  methods  is  included  in  the  IS  tool. 

IV.  Synthesis  objects 

A  Synthesis  Object  (SO)  has  quite  complicated  internal  structure  (see  Figure  2).  The 
“input  side”  and  “output  side”  distinguished  in  Figure  2  represent  data  after  performing  a 
synthesis  step  (i.  e.,  the  input  of  the  next  step),  and  the  results  of  the  performed  IS  step, 
respectively. 

Vi  Synthesis  Graph 

The  SO  in  Figure  2  has  no  information  on  the  applied  synthesis  method  nor  what  was 
the  original  circuit  function  (or  a  set  of  functions)  on  which  the  method  was  applied.  This 
information  will  be  made  available  via  linking  the  SO  to  a  lattice,  consisting  of  other  SO  s 
and  also  of  some  special  additional  data  structures  for  managing  the  lattice.  This  lattice  of 
SO’s  is  called  Synthesis  Graph  (SG). 

The  reason  why  the  SG  is  not  necessarily  a  tree  is  the  manifold  of  the  solution  of  the 
synthesis  problem,  when  the  different  sets  of  synthesis  steps  may  yield  the  same  result.  The 
VSGs  in  the  SG  are  interconnected  with  Arcs  of  SG  (ASGs). 

A.  Vertex  of  the  SG 

The  structure  of  the  Vertex  of  SG  (VSG)  is  presented  in  Figure  3,  a.  The  VSG  holds 
information  on  the  SO  it  is  encapsulating  and  the  information  on  the  placement  of  the  VSG  in 
the  SG.  References  to  the  nearest  neighboring  VSGs  are  incorporated.  Tire  information  on 
synthesis  step  performed  on  the  circuit  function  is  not  in  the  VSG,  but  in  the  Arc  of  the  SG.  In 
Figure  3,  a,  the  references  to  the  arcs  of  the  SG  are  gathered  into  one  group,  although  the  arcs 
may  be  either  stalling  from  the  particular  VSG  or  ending  at  the  VSG. 

There  is  one  special  VSG  in  the  structure  of  SG,  which  is  called  the  Start  Vertex.  This 
VSG  does  not  have  anything  but  the  names  of  input  nodes  of  die  circuit  to  be  synthesised  on 
the  output  side,  because  no  synthesis  step  has  performed  yet.  The  VSG  is  holding  information 
on  circuit  function(s)  after  performing  a  particular  synthesis  step  so  that  it  could  be  deleted 
without  affecting  data  in  other  VSGs.  All  connecting  arcs  are  deleted  when  a  VSG  is  deleted. 
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Figure  2.  Structure  of  Synthesis  Object 


B.  Arc  of  the  SG 

The  structure  of  the  Arc  of  SG  (ASG)  is  presented  in  Figure  3,  b.  The  ASG  always 
connects  to  exactly  two  vertices  of  the  SG,  and  carries  information  on  the  synthesis  step, 
which  was  used  to  get  from  the  start  VSG  of  the  particular  arc  to  the  end  VSG  (See  Figure  3, 
b).  Thus,  the  ASG  has  a  direction,  and  the  SG  is  a  directed  graph. 


(a)  (b) 

Figure  3.  Structures  of  the  Synthesis  Graph  Vertex  (a)  and  Arc  of  Synthesis  Graph  (b). 

VI.  Synthesis  Calculator 

The  tool  performing  IS  steps  on  given  circuit  function(s)  is  called  Synthesis  Calculator. 
Its  structure  and  relationships  to  the  SG  are  shown  in  Figure  4.  The  SC  Core  is  built  up  of  two 
modules  -  the  CommandSet  (CS)  arithmetics  library  and  the  Commandlnterpreter  (Cl).  The 
CS  implements  basic  mathematical  operations  on  polynomials  and  rational  functions,  and  the 
Cl,  using  the  base  functions  from  CS  implements  synthesis  methods  (see  e.  g.,  [1],  [4]).  The 
Cl  also  extends  the  possibilities  of  IS  by  implementing  a  dedicated  macro  language,  using 
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primitive  methods  and  functions  from  both 
CS  library  and  the  Cl  itself.  This  will  make 
adding  new  synthesis  methods  to  the  SC 
easier. 

The  SC  also  includes  a  Userlnterface 
(UI)  module,  which  organises  all  data 
exchange  between  the  user  and  the  SC.  In 
Figure  4,  the  UI  module  is  named  Graphical 
User  Interface  (GUI)  and  taps  its  information 
from  both  input  and  output  of  the  SC  core. 
The  UI  might  be  implemented  using  some 
standard  application  program  interface  (API). 
The  implementation  of  the  data  structures 
and  the  basic  modules  (CS  and  Cl)  of  the  SC 
is  object-oriented.  The  object-oriented 
approach  for  implementation  of  the  SC  data 
items  like  polynomials  and  rational  functions 
is  briefly  presented  in  [1], 

F 

VII.  Conclusions  and  Outlook 


Structure  of  the  Synthesis  Calculator 


The  interactive  synthesis  principles  and  one  way  of  implementation  was  proposed.  The 
type  of  the  function  which  can  be  synthesised  is  restricted  to  rational  functions  with  real 
coefficients.  It  is  possible,  that  some  sequences  of  synthesis  steps  will  become  more  used  than 
others.  This  will  raise  a  demand  for  module,  which  simply  fulfils  these  steps  for  designer,  like 
a  macro,  thus  simplifying  the  interactive  synthesis  process.  Moreover,  planning  modules  may 
appear,  which  are  capable  of  planning  the  actions  which  must  be  done  by  the  synthesiser.  This 
will  turn  the  interactive  synthesiser  to  just  a  synthesiser. 
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Noise  Properties  of  High-Order  BP  OTA-C  Filter  Structures 
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Abstract.  Different  realizations  of  8th  order  band-pass  filters  using 
bipolar  OTA-C  structures  are  analyzed  from  the  noise  and  dynamic  range 
point.  Noise  voltage  spectral  density,  RMS  noise  voltage  and  dynamic  range 
are  calculated  for  Cascade,  Cascade  of  Biquarts,  FoIIow-the-Leader- 
Feedback  and  Leap-Frog  filter  structures.  The  analyses  results  are  compared 
to  the  results  obtained  from  the  filter  structures  using  operational  amplifiers. 
They  are  also  confirmed  by  PSPICE  simulation.  The  best  noise  properties  for 
small  signal  level  gives  the  Leap-frog  8th  order  OTA-C  filter  structure. 


I.  Introduction 

Significant  limitation  in  the  realization  of  quality  low  voltage  signal  processing  circuits 
can  be  noise  level  produced  by  used  circuits.  Particularly  in  high-order  filter  circuits,  it  is 
important  to  have  the  noise  level  as  low  as  possible.  In  the  recent  paper  [1]  various  second- 
order  OTA-C  filter  sections  are  analysed  and  the  contribution  of  each  section  element  to  the 
overall  noise  figure  is  analysed.  In  this  paper  the  high-order  structures  such  as:  Cascade  (CAS), 
Cascade  of  Biquarts  (CBQ),  Follow-the-Leader-Feedback  (FLF)  and  Leap-Frog  (LF)  are 
analysed.  Some  of  these  structures  are  developed  in  order  to  minimize  sensitivities  to  the  filter 
element  values  mismatching.  The  question  was  whether  the  noise  figures  of  low  sensitive  filter 
structures  are  good  as  well.  The  filters  are  designed  using  the  OTA-C  filter  section  (GFS)  [2] 
as  a  building  block.  They  are  compared  to  the  filters  with  the  single  amplifier  biquad  sections 
(SAB).  Noise  effects  are  calculated  using  the  transfer  functions  with  respect  to  each  noise 
source  [3].  The  ratio  of  maximum  undistorted  voltage  level  and  the  RMS  noise  within  a 
specified  frequency  range  is  used  as  a  dynamic  range  (Dr)  measure  [4]. 


II.  Noise  and  Dynamic  Range 

The  noise  models  for  resistors,  operational  amplifiers  (OA),  and  operational 
transconductance  amplifiers  (OTA)  used  throughout  the  analysis  are  shown  in  the  fig.l. 
Resistors  are  represented  by  a  Nyquist  current  noise  model  shown  in  fig. l.a),  while  the  OAs 
and  OTAs  are  represented  by  models  shown  in  fig. Lb),  and  fig.l.c),  respectively,  where 
En=20nVMlz  for  OA,  En=8nV/VHz  for  OTA  and  In=0.01pAA/Hz  for  both  amplifiers. 

OTA-C  filter  structures  are  designed  using  the  general  OTA-C  filter  section  (GFS) 
shown  in  the  fig,2.a).  The  transfer  function  of  this  section  is 

_  Sa 


T(s) 


OUT 

VlN 


S2+Sif  +  ME 
C  2  C\C'j 


(1) 


■1^2 


Second-order  SAB  filter  section,  used  in  the  design  of  the  filter  structures  with  operational 
amplifiers,  is  shown  in  the  fig.2.b)  and  its  transfer  function  is 
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Fig.  1.  Noise  model  fon  a)resistor,  b)OA,  c)OTA  Fig.  2. 2nd  order  filter  sections:  a)GFS,  b)SAB 


■\r,C2  R4C3  RiRiCi)  RiRaC2C3 

Using  the  above  exposed  filter  blocks,  eight-order  BP  filter  is  realized  using  four  different 
types  of  8th  order  structures:  CAS  (fig.3.a),  CBQ  (fig.3.b),  FLF  (fig.3.c)  and  LF  (fig.3.d). 
Substituting  the  elements  in  these  structures  by  corresponding  noise  model,  transfer  functions 

(3) 

are  calculated,  where  Nx  in  is  either  voltage  or  current  noise  source.  As,  a  Dr  measure  is  used 

.  <4> 

The  numerator  represents  maximal  undistorted  RMS  output  voltage,  and  the  denominator  is 
RMS  noise  voltage  within  a  specified  frequency  range  given  by 

(*)  ->«*  (5) 
V?(a>)  is  the  square  of  the  voltage  noise  spectral  density  derived  from  all  noise  sources  and 
belonging  transfer  functions  i.e. 

=  i  M>) |V4 + (6) 


The  transfer  function  T^a)  is  actually  a  transfer  impedance  i.e.  a  ratio  of  output  voltage  and 
input  current  of  £-th  current  noise  source  (In)k,  while  Tv,i(j  &>)  is  a  voltage  transfer  function  i.e. 
a  ratio  of  the  output  voltage  and  input  voltage  of  1-th  voltage  noise  source  (Vn)i.  These  transfer 
functions  are  too  complex  to  be  presented  here.  Using  the  above  expressions  noise  voltage 
spectral  density  and  dynamic  range  defined  previously  are  analysed. 


III.  Example 

Based  on  the  above  expressions  voltage  noise  spectral  density  using  (6),  and 
correspondent  dynamic  range  defined  by  (4)  and  (5)  are  calculated  for  the  following  example. 
The  8^  order  Chebyshev  magnitude  response  filter  is  designed  with  4kHz  central  frequency, 
1.8kHz  pass-band  width,  and  O.ldB  pass-band  ripple.  Filter  parameters  are  given  in  table  I. 
Noise  and  dynamic  range  is  calculated  over  the  frequency  range  lOHz-lOOkHz.  The  voltage 
noise  spectral  density  of  the  structures  using  OTA-C  2nd  order  sections  are  presented  in  fig.4 
and  for  the  structures  using  SAB  sections  are  given  in  fig.5.  Comparing  these  results  to  the 
results  obtained  from  the  simulation  program  PSPICE,  an  excellent  agreement  was  found. 
Particular  attention  in  the  design  procedure  of  an  OTA-C  filter  must  be  paid  to  the  dynamic 
range.  For  Dr  calculation  upper  limit  of  the  output  signal  has  to  be  known  as  well.  It  is  for 


OTA-C  based  filters  approximately  50mV  [6],  and  for  the  SAB  based  filters  approximately 
5V  [5].  Additional  problem  is  the  existence  of  more  than  one  section  output.  During  the 
design  procedure  dynamic  range  optimization  has  to  be  performed.  This  was  done  for  all 
filters  in  our  case,  and  maximum  of  magnitudes  at  the  outputs  of  2nd  order  section  are  equal. 


Tab.  I.  Parameters  of  8th  order  filter  structures 
|  1.  section  |  2.  section  [  3.  section  4.  section 
CAS 
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Fig.  3.  Eight  order  filter  structures:  a)CAS, 
b)CBQ,  c)FLF,  d)LF 
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Fig.4.  Spectral  density  for  OTA-C  structures 
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Fig.5.  Spectral  density  for  SAB  structures 


Numerical  results  of  noise  voltage  spectral  density,  RMS  noise  voltage  and  dynamic  range 
according  to  (4),  (5),  and  (6)  for  each  described  structure  are  given  in  tables  II  and  III.  From 
the  table  II  it  can  be  seen  that  the  highest  Dr  has  the  LF  structure.  This  is  veiy  convenient 
because  LF  structure  is  superior  from  the  sensitivity  point  as  well.  Using  the  presented  GFS 
section  the  infinite  pole  Q-factors  can  be  easily  achieved.  On  the  contrary,  for  filters  using 
SAB  sections,  cascade  structure  has  minimal  RMS  noise  voltage  but  it  also  has  maximal 
sensitivities.  Which  structure  should  be  preferred  depends  primarily  on  particular  application. 
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Generally,  data  from  tables  II  and  III  shows  that  filter  structures  using  OTA-C  2n  order 
building  blocks  have  lower  RMS  noise  voltage.  On  the  other  hand  limitation  on  input  voltage 
level  gives  lower  Dr  in  comparison  to  the  filters  with  operational  amplifiers. 

Tab.  II.  Noise  voltage  spectral  density  maximum,  Tab.  III.  Noise  voltage  spectral  density  maximum, 


RMS  voltage  and  Dr  for  GFS  generated  structures  RMS  voltage  and  Dr  for  SAB  generated  structures 


CAS 

(Vn)max(nV/V&) 

0.1557 

(EnKpV) 

9.048 

DR(dB) 

71.84 

CAS 

(Vn)imx(nV/V7fe)  (EnMnV) 
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CBQ 
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1.6734 
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FLF 

1.3546 

109.07 

90.21 

LF 

0.0794 

5.707 

75.84 

LF 

1.1005 

81.982 

92.69 

If  the  price  of  filter  is  important,  advantage  have  SAB  filters.  Table  IV  presents  number  of 
amplifiers  and  Dr  for  the  corresponding  structures.  Commercially  available  OAs  have  lower 
price  than  OTAs.  Number  of  OA  for  the  same  configured  8th  order  filter  is  about  four  times 
lower  then  numbers  of  OTAs  what  make  the  price  of  SAB  generated  filter  lower. 


Tab.  IV.  Number  of  amplifiers  and  Dr  for  presented  filters 

No.  OA 
DR(dB) 

CASSAB 

4 

96.60 

CBQSAB  FLFSAB  LFSAB  CASGFS  CBQGFS 

6  6  7  16  22  * 

90.76  90.21  92.69  71.84  74.64 

FLFGFS 

21 

73.28 

LFGFS 

25 

75.84 

IV.  Final  Remarks 

Noise  and  dynamic  range  properties  of  different  filter  structures  have  been  analysed. 
The  best  noise  figure  gives  the  OTA-C  Leap-frog  structure.  Since  it  is  the  best  structure  from 
the  sensitivity  point  as  well,  there  is  no  need  for  making  the  compromise  during  the  design 
procedure.  Possible  limitation  can  be  a  filter  price  because  of  the  need  for  relatively  high 
number  of  used  OTAs.  In  application  where  quality  of  a  processed  signal  with  respect  to  noise 
influence  is  important,  LF  structure  is  most  suitable.  CBQ  follows  the  LF  structure  having  a 
good  noise  figure,  relatively  low  sensitivities  and  more  simple  design  procedure. 

From  the  obtained  results  it  is  obvious  that  the  Leap-frog  OTA-C  filter  structure  has 
the  smallest  RMS  noise  and  the  best  Dr.  However,  compared  to  any  of  SAB  structures  it  has 
less  dynamic  range.  Thanking  to  the  better  noise  figure  it  is  suitable  for  application  in  the  low 
voltage  signal  processing. 
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Abstract,  An  all-fibre  antenna  using  piezoelectric  polymer  coated  circular 
core  D-fibre  has  been  characterised  using  finite  element  analysis.  The  response  of 
the  D-fibre  antenna  was  determined  over  a  wide  frequency  range  from  1MHz  to 
700MHz.  The  modelling  predicts  an  electric  field  induced  phase  shift  of  2.43  x  10' 

5  rad/(V/m)  per  metre  at  5MHz.  At  frequencies  higher  than  8MHz  the  optical 
response  is  dominated  by  radial  resonances  of  the  D-fibre/coating  composite. 


I.  Introduction 

Optical  fibre  microcellular  networks  have  been  the  subject  of  extensive  research  over 
recent  years  mainly  due  to  the  low  loss  and  large  bandwidth  of  optical  fibre.  With  the 
increasing  requirement  for  high  capacity  mobile  multimedia  services,  transmission  systems 
incorporating  both  radio  and  optical  fibre  elements,  known  as  radio-over-fibre  (RoF)  systems, 
are  expected  to  find  an  increasing  role  in  telecommunication  networks  over  the  coming  years 
[1].  RoF  systems  rely  on  the  RF  subcarrier  to  modulate  the  optical  signal,  which  is  then 
distributed  by  optical  fibre.  A  recent  novel  approach  in  generating  an  externally  modulated 
optical  signal  for  transmission  through  an  RoF  network  has  been  to  employ  an  optical  fibre 
based  antenna  using  piezoelectric  polymer  coated  D-fibre  [2],  The  D-fibre  antenna  is  used  to 
phase  modulate  the  lightwave  with  a  received  RF  electrical  signal. 

The  D-fibre  as  shown  in  Figure  1  has  a  D-shaped  cross-section,  with  a  flat  surface 
parallel  to  the  longitudinal  axis  of  the  fibre.  The  unique  property  of  D-fibre  which  makes  it 
more  attractive  than  conventional  circular  fibre  for  sensing  purposes  originates  from  the 
greater  interaction  of  the  propagating  optical  field  with  the  external  space  on  the  planar  side  of 
the  fibre  geometry.  In  conventional  circular  fibre  the  optical  field  remains  within  the  fibre 
structure  due  to  the  glass  cladding  layer,  in  the  case  of  D-fibre  the  guiding  region  is  now 
closer  to  the  outer  surface  of  the  fibre.  Moreover,  removing  a  small  amount  of  the  cladding 
layer  from  the  flat  surface  will  bring  the  optical  field  (known  as  the  evanescent  field)  to  the 
surface.  In  this  way,  the  evanescent  field  allows  a  much  greater  interaction  between  the  optical 
lightwave  and  any  outside  perturbation,  thus  allowing  construction  of  a  far  more  sensitive 
device. 
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Using  FEA  this  work  presents,  for  the  first  time,  a  wide  frequency  response 
investigation  of  an  optical  antenna  based  on  piezoelectric  coated  D-fibre.  Results  showing  the 
frequency  response  from  1MHz  to  700MHz  of  the  phase  shift  induced  in  a  circular  core  D- 
shaped  optical  fibre  jacketed  with  a  transversely  polarised  piezoelectric  material  are  presented. 

II.  FEA  Simulation 

The  commercial  software  package,  Abaqus®  was  used  to  carry  out  the  finite  element 
modelling  of  the  D-fibre  antenna.  A  D-shaped  optical  fibre  with  a  d-distance  (flat  surface/core 
distance)  of  7pm,  cladding  diameter  of  125pm  and  carrying  20pm  thick  piezoelectric  coating 
was  modelled  by  using  three-dimensional  finite  element  analysis.  The  mesh  in  Figure  2 
represents  a  symmetrical  cross-section  of  the  D-fibre/jacket  composite.  Each  region 
representing  the  D-fibre  core,  cladding  and  piezo-jacket  was  meshed  separately  by  using  linear 
brick  elements. 


Figure  2.  Finite  element  mesh  representing  symmetrical  cross-section  of  the  D-fibre  with  its  PVDF  coating 

Steady-state  dynamic  response  analysis  was  employed  to  compute  the  axial  and 
radial  strain  distribution  within  the  glass  D-fibre  resulting  from  the  converse  piezoelectric 
effect.  This  procedure  is  used  when  the  steady-state  response  of  a  system  is  required  as  it 
undergoes  excitation  by  harmonic  loading  at  a  given  frequency.  Such  analysis  is  usually  done 
as  a  frequency  sweep  by  applying  the  loading  (ac  voltage)  at  a  series  of  different  frequencies 
and  recording  the  response.  The  solution  provides  the  peak  amplitudes  and  phase  relationships 
of  the  solution  variables  (strain,  displacements,  etc.)  as  a  function  of  frequency.  Once  the 
strain  coefficients  are  known,  the  optical  phase  shift  resulting  from  both  the  change  in  fibre 
length  and  refractive  index  can  be  calculated  [3]. 

III.  D-Fibre  Antenna  Response 

To  obtain  the  response  of  the  D-fibre  antenna  to  an  ac  electric  field,  a  frequency  sweep 
was  carried  out  over  the  range  of  values  from  1MHz  to  700MHz  as  shown  in  Figure  3.  In  this 
high  frequency  region  the  net  axial  strain  tends  to  zero  as  the  wavelength  of  the  acoustic 
waves  propagating  in  the  D-fibre  becomes  smaller  than  the  longitudinal  dimensions  of  the 
device,  hence  the  D-fibre  response  can  be  considered  as  being  axially  constrained  [4].  Thus  at 
high  frequencies  the  dominant  contribution  to  the  overall  phase  shift  is  only  from  the  radial 
strains  induced  by  the  electric  field. 
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Figure  3.  Finite  element  results  showing  optical  phase  shift  as  a  function  of  the  applied  ac  electric  field 
frequency  for  a  10cm  length  D-fibre  coated  with  PVDF  polymer 

An  optical  phase  shift  of  0.00173  rad/V/m  was  calculated  at  1MHz.  At  frequencies 
higher  than  8MHz,  the  response  is  dominated  by  radial  resonances  of  the  D-flbre/jacket 
composite  as  the  acoustic  wavelength  becomes  comparable  to  the  radial  dimensions  of  the 
device.  A  large  number  of  radial  resonance  peaks  are  observed  in  the  region  from  8-700MHz. 
The  first  resonance  peak  is  at  ~8MHz  and  the  last  at  ~694MHz. 

IV.  D-Fibre  Antenna  Network 

A  convenient  detection  scheme  for  the  D-fibre  antenna  would  be  to  mix  the  light  beam 
at  the  output  of  the  polymer  coated  D-fibre  with  that  of  a  second  reference  fibre,  which  is  not 
sensitive  to  the  electric  field,  in  a  standard  interferometric  system.  Using  such  a  scheme  it  is 
assumed  that  in  a  mobile  microcellular  network  each  microcellular  station  will  include  one 
interferometric  D-fibre  antenna.  Furthermore,  in  the  proposed  antenna  network  it  is  assumed 
that  a  group  of  microcellular  antennae  are  connected  together,  and  fed  by  a  single  laser  diode 
as  shown  in  Figure  4.  The  minimum  detectable  phase  shift  for  the  network  is  defined  as  that 
for  which  the  total  signal  current  equals  the  total  noise  current  (i.e.  when  the  SNR  equals  1) 
thus  [2]: 


Af  = 


i 

4hvAfNl~\2 

r\P0 


(1) 


where  r\  is  the  quantum  efficiency  of  the  detector,  P()  is  the  average  laser  power,  At)), 
is  the  minimum  detectable  phase  shift  for  the  ith  D-fibre  antenna#  is  Planck's  constant,  v  is 
the  frequency  of  light,  A/  is  the  bandwidth  of  the  detector  and  Na  is  the  number  of  D-fibre 
antennae  within  the  network.  Using  the  values#  =  6.626  x  10'34  Js,t|  =  0.5,  v  =  5  x  1014  Hz 
and  Pn  =  10'3  W  then  for  a  network  comprising  two  interferometric  D-fibre  antennae  and 
assuming  shot  noise  limited  detection,  the  minimum  detectable  phase  shift  using  eqn.(l)  can 
be  shown  to  be  0.146  prad/VHz. 
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Figure  4.  Microcellular  D-fibre  antenna  network  Figure  5.  D-Fibre  Network  Sensitivity 


The  variation  between  the  number  of  D-fibre  antennae  and  the  shot  noise  limited 
minimum  detectable  phase  shift  of  the  proposed  network  is  shown  in  Figure  5.  As  can  be  seen 
from  Figure  5,  by  increasing  the  number  of  D-fibre  antennae  the  level  of  minimum  detectable 
phase  shift  increases  resulting  in  an  overall  decrease  in  network  sensitivity. 

V.  Conclusions 

The  wide  frequency  response  of  an  all-fibre  optical  antenna  comprising  a  circular  core 
D-fibre  coated  with  a  transversely  poled  piezoelectric  material  has  been  demonstrated.  Finite 
element  modelling  was  employed  to  compute  the  strain  induced  phase  shift  over  the  frequency 
range  from  1MHz  to  700MHz.  The  FEA  simulations  predict  a  phase  shift  value  of  0.00173 
rad/V/m  in  the  high  frequency  (axially  constrained)  region  At  frequencies  higher  than  8MHz 
the  optical  response  is  dominated  by  radial  resonances  of  the  D-fibre/coating  composite.  An 
antenna  network  has  been  analysed  using  shot  noise  limited  detection,  which  compares  the 
relationship  between  number  of  D-fibre  antennae  and  network  sensitivity.  Since  the  D-fibre 
antenna  is  constructed  from  totally  dielectric  materials  it  can  be  used  to  receive  radio 
frequency  transmissions  without  distorting  or  disturbing  the  field  lines  since  there  are  no 
metallic  components  to  reflect  or  transmit  radio  frequency  energy. 
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Abstract.  In  this  paper  we  report  on  the  analysis  of  two  different 
topologies  of  all-optical  encoders  for  O-CDMA  systems.  The  encoders 
consist  of  Y  and  X  fibre  splitters  and  single  mode  fibre  delay  lines.  The 
measured  insert  losses  of  the  realised  encoders  are  11  dB  and  14.6  dB.  The 
properties  of  realised  O-CDMA  systems  with  two-stage  ladder  encoder  and 
decoder  are  analysed. 

L  Introduction 

Among  the  different  means  of  achieving  high  capacity  networks,  time-division  multiple 
access  (TDMA),  wavelength-division  multiple  access  (WDMA)  and  code-division  multiple 
access  (CDMA)  have  received  significant  attention  in  recent  years  [1]. 

Optical  code-division  multiple  access  (O-CDMA)  is  a  bandwidth  utilisation  scheme  in 
which  many  users  access  a  common  channel  simultaneously  through  the  use  of  encoding.  Each 
user  employs  a  unique  code  to  distinguish  the  user’s  signal  from  other  users.  Due  to  its 
advantages,  CDMA  has  been  the  topic  of  research  in  the  last  years,  primarily  in  the  radio 
frequency  domain,  but  also  in  the  optical  domain. 

Security  and  privacy  are  increasingly  important  issues  for  many  communication 
applications.  While  various  algorithms  can  be  applied  to  electronically  encrypt  data, 
implementation  may  lead  to  an  electronic  bottleneck  for  high-speed  data  transmission.  In 
contrast,  the  optical  coding  schemes  such  as  O-CDMA  potentially  provide  high  levels  of 
security  but  are  effectively  transparent,  since  the  encryption  is  performed  optically  and  hence 
are  suitable  for  high  bit-rate  applications. 

In  this  paper,  we  examine  two  different  topologies  of  optical  encoders  for  O-CDMA 
systems,  a  two-stage  ladder  and  single-stage  star  configuration,  respectively.  Both  encoders 
are  realised  using  Y  and  X  fibre  splitters.  Generally,  the  first  topology  offers  lower  insert  losses 
and  the  second  more  possibilities  in  independent  coding  of  the  signal.  Finally  the  properties  of 
the  realised  O-CDMA  system  with  two-stage  ladder  encoder  and  decoder  have  been  analysed. 

n.  All-optical  encoders  for  O-CDMA  systems 

Two  types  of  all-optical  encoders  for  O-CDMA  systems  have  been  studied.  Encoders 
were  realised  using  singlemode  Y  and  X  fibre  splitters  from  AMP  and  FOCI,  optical  fibres 
produced  by  SIECOR  and  optical  connectors  and  couplers  from  SII.  The  two-stage  ladder 
encoder  shown  in  Fig.  1  consists  of  two  Y-splitters  and  one  X-spliUer.  The  lengths  of  optical 
delay  lines  in  the  first  and  second  stage  of  the  encoder  were  30  m  and  60  m,  respectively. 
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Fig.  1  Two-stage  ladder  type  of  the  encoder 


The  second,  single-stage  star  encoder  consists  of  six  Y-splitters  (three  in  every  1x4 
splitter)  and  four  optical  fibres  with  different  length  as  it  is  seen  from  Fig.  2.  The  first  fibre 
delay  line  was  6  m,  second  12  m,  third  18  m  and  fourth  25  m  long. 


Fig.2  Single-stage  star  type  of  the  encoder 


ILL  Measurement  and  results 

Properties  of  constructed  encoders  have  been  analysed  using  the  experimental  set-up 
on  the  Fig.  3.  During  the  measuring  of  the  properties  of  encoder  the  decoder  has  not  been 
connected  in  the  circuit.  The  pulse  generator  type  TR  307,  EMG  Hungary,  high  speed  digital 
laser  transmitter  BCP,  model  51T-  231  (1310  nm,  1  mW,  1.5  Gbps),  high  speed  O/E  converter 
BCP,  model  310B  (InGaAs  APD,  20  dB  linear  amplifier)  and  oscilloscope  Le  Croy  9362 
(1.5  GHz,  10  Gs/s)  were  used  for  measuring  of  the  properties  of  the  encoders  and  realised 
O-CDMA  system. 

The  insert  losses  of  the  encoders  were  measured  using  5  ns  input  optical  pulse  with 
a  period  750  ns.  Measured  input  pulse  amplitude  was  24.5  mV  (1  mW  of  optical  power). 

In  the  two-stage  ladder  type  of  the  encoder  the  average  amplitude  of  four  output 
optical  pulses  was  1.94  mV.  The  delay  between  the  first  and  the  second  optical  pulse  of  output 
coded  signal  was  150  ns,  between  the  first  and  third  pulse  300  ns  and  between  the  first  and 
forth  pulse  was  450  ns.  The  difference  in  amplitude  of  the  output  pulses  was  lower  than 
0.3  mV.  It  was  caused  by  asymmetric  splitting  ratio  in  optical  splitters.  Measured  total  insert 
losses  of  the  encoder  are  11  dB,  what  is  in  a  good  agreement  with  expected  value  (10.8  dB). 

In  the  single-stage  star  encoder  the  average  amplitude  of  four  output  optical  pulses  was 
0.85  mV.  Measured  total  insert  losses  of  the  encoder  are  14,6  dB,  what  correspond  with 
expected  value  (14.4  dB).  The  higher  total  insert  losses  are  the  result  of  higher  number  of  the 
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Fig.  3  The  block  diagram  of  set-up  for  measurement 


fibre  splitters  and  optical  connectors  needed  for  realisation  of  this  encoder  topology.  The  delay 
between  the  first  pulse  of  output  coded  signal  and  the  second,  third  and  a  fourth  pulse  was  29, 
60  and  117.6  ns,  respectively. 


Fig  4  The  output  signal  of  the  decoder  with  the  autocorrelation  peak  in  the  center 
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Fig  5  Detail  view  on  the  autocorrelation  peak  of  the  output  signal 

Finally,  the  simple  O-CDMA  system  consisting  of  two-stage  ladder  type  of  the  encoder 
and  identical  decoder  ( Fig.  3),  was  realised  and  its  properties  have  been  analysed. 

The  output  signal  with  the  autocorrelation  peak  in  the  centre  of  the  output  pulse 
sequence  is  in  the  Fig  4.  Asymmetry  in  the  amplitudes  of  the  pulses  of  the  output  signal  is 
caused  by  asymmetry  in  the  fibre  splitters.  The  detail  analysis  of  output  pulses  shows,  that  the 
small  differences  in  the  length  of  the  fibres  in  the  delay  lines  cause  the  time  shift  between  the 
subpulses,  which  create  the  autocorrelated  peak,  as  it  can  be  seen  in  Fig  5. 

IV.  Conclusion 

The  properties  of  the  two  types  of  all- optical  encoders  for  O-CDMA  systems  have 
been  analysed.  The  measured  insert  losses  of  the  two-stage  ladder  type  encoder  and  single 
stage  star  encoder  were  lldB  and  14.6dB,  respectively.  The  one-stage  star  type  of  encoder 
offers  higher  variability  and  possibility  for  different  coding  methods,  for  example  coding  using 
optical  switches  [2]. 

Analysis  of  the  realised  O-CDMA  system  shows,  that  higher  stability,  quality  and 
reliability  O-CDMA  system  needs  the  construction  of  integrated  all-optical  encoders  and 
decoders  and  the  application  of  very  short  optical  pulses.  This  allows  using  shorter  length  of 
delay  lines  and  results  in  liigher  stability  and  accuracy  of  encoding  and  decoding  signals. 
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Abstract.  The  use  of  HSPICE  for  GaAs  MMIC  design  and  optimisation 

is  described.  HSPICE  and  MDS  designs  are  compared.  The  use  of  the  HSPICE 

F20  Standard  Elements  Library  is  shown. 

Introduction 

Usual  way  how  to  solve  problems  concerning  the  monolithic  microwave  integrated 
circuits  (MMIC)  design,  analysis  and  optimisation  is  to  use  the  specialised  software  packages 
for  each  particular  task.  However,  there  are  some  possibilities  to  use  some  existing  wide 
spread  SPICE-based  software,  i.e.  HSPICE,  which  can  be  successfully  used  for  AC  analysis 
in  frequency  domain  as  well.  The  building  elements  of  MMIC  could  be  partly  different  from  the 
usual  elements  in  classical  integrated  circuits.  In  practical  design,  however,  the  designer's 
choice  of  building  elements  is  constrained  to  devices  produced  by  the  foundry.  For  authentic 
circuit  design  the  well-defined  and  verified  models  of  used  elements  are  necessary.  They  arc 
usually  given  by  foundry  as  an  element  library  for  specified  microwave  design  tools,  but  not 
for  HSPICE. 

Device  Modelling 

In  this  work,  the  HSPICE  library  of  basic  MMIC  building  elements  was  implemented, 
based  on  the  GEC-Plessey-Marconi  "GaAs  IC  Foundry  Design  Manual"  [1]  for  process  F20. 
Since  the  F20  process  is  available  in  the  EUROPRACTICE  project,  there  are  some 
perspectives  to  use  this  HSPICE  F20  Standard  Element  Library  [2]  for  the  design  of  real 
MMIC,  especially  for  educational  purposes. 

Devices  are  parametrically  modelled  using  HSPICE  .subekt  and  .param  commands  with 
the  input  of  geometrical  and/or  electrical  parameters.  Following  building  element  models  are 
created  in  the  library:  spiral  inductors  (5  kinds),  line  inductors  (2  kinds),  capacitors  (3  kinds), 
mesa  resistor,  lossless  and  lossy  non-symmetric  microstrip  transmission  lines  (8  kinds), 
microstrip  to  MMIC  transition  (2  kinds),  via,  bondpad.  Two-,  four-  and  six-gate 
F20  MESFETs  with  and/or  without  vias  in  the  source  electrode  are  used  as  the  active 
elements.  Most  of  the  models  are  driven  by  electrical  or  geometrical  (layout)  parameters, 
which  all  are  optimisable.  The  parasitic  properties  are  tracking  optimisation  or  any  change 
of  the  input  parameters  or  frequency.  The  transistor  models  are  scaleable,  linear 
and  non-linear  as  well.  There  are  no  noise  properties  modelled. 
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For  the  illustration  of  simple  library  element  description,  an  example  of  mesa  resistor 
subcircuit  netlist  is  shown  in  Tab.  I,  where  ie;=ioO  lu=12  w=69  dw=0  resistor 
default  layout  parameters,  which  are  optimisable.  Equivalent  circuit  element  values  as  shown 
in  Fig.l  are  dependent  not  only  on  resistor  layout  dimensions,  but  frequency  as  well. 


Fig.  1.: 

Mesa  resistor  equivalent  circuit 


.  subckt  mesa_r  in  out  le=100  lu~12  w=69  dw~0 
.param  Rdc—'  (300*Le+180*Lu+1080) / (W-dW)  r 
.param  Z= '  A_Z+B_Z*W+C__Z*W*W-D_Z*W*W*W 
'  .param  pl='  (A_E*W+B_E)  r 

.param  p2-  ’  C_E*exp  (log  (W)  *log  (D__E)  )  +E_E ' 
.param  p3=’10f*(Le+Lu)/(18*Z)  ' 
cl  in  0  '  (sqrt  (pl*hertz*ln+p2)  *p3)  ' 
r  in  out  Rdc  ac= ' Rdc*  (1+0 . 007*hertz*ln)  ' 
c2  out  0  '  (sqrt  (pl*hertz*ln+p2)  *p3)  ' 

. ends  mesa  r _ 


Tab.  I: 

Example  of  mesa 
resistor  description 


The  device  call  is  then  very  simple  as  usual  subcircuit  call  from  library: 

x  mesa  r  name  nodel  node2  mesa_r  le~100  lu=12  w=69  dw~0 
All  available  devices  in  the  F20  HSPICE  library  [2]  are  described  in  similar  manner. 


Library  Elements  Verification 

To  approve  the  validity  of  the  library  elements  and  the  idea  of  using  HSPICE  for  MMIC 
simulation  a  system  approach  was  used.  Three-stage  low  noise  amplifier  (LNA)  from  Fig.5.49 
in  [3]  was  chosen  as  an  exercise  example  for  a  comparison,  because  of  the  same  foundry  used. 
The  circuit  diagram  of  the  LNA  used  for  this  comparison  is  shown  in  Fig.  2.  The  originally 
measured  and  simulated  properties  from  [3]  are  drawn  in  Fig.  3. 
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Fig.  2.:  Circuit  diagram  of  the  three-stage  LNA  (Fig.5.49  in  [3]) 


The  same  LNA  from  the  circuit  diagram  in  Fig.  2  was  simulated  using  the 
HP  Microwave  Design  System  [4]  with  the  EUROCHIP  GEC  F20  library  from  ENSEA  [5] 
and  HSPICE  with  our  F20  HSPICE  Library.  Both  simulated  results  are  shown  in  the  Fig.  4. 


GAIN 

o*—  Meas 
■  —  Design 
NOISE  FIGURE 
o—  Meos 
- Design 

INPUT  MATCH 
* —  Meas 
- Design 

Fig.  3.:  Originally  measured 
and  simulated  properties  of 
the  LNA  [3] 

Frequency  (GHz| 


Fig.  4.:  Simulated  properties 
of  the  LNA  -  HSPICE  and 
HP  MDS  comparison 
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There  is  a  difference  between  MDS  and  HSPICE  simulation,  more  meaningful  in  S21 
parameter  (gain),  which  is  mainly  influenced  by  the  feedback  resistors  R2,  R4.  To  find  their 
differences,  they  were  optimised,  so  that  both  HSPICE  and  MDS  S21  -responses  were  most 
close.  Their  new  values  were  found  (MDS)  to  be  R2=147  ohm,  R4=195  ohm,  which  differs 
from  their  original  values  (-26.5%,  -18.75%).  Then  we  found,  that  the  different  way  to  input 
resistors  into  MDS  is  responsible  for  such  a  variance.  When  the  responses  are  compared, 
with  the  Fig.  3,  a  slight  difference  in  S21  and  more  significant  in  SI  1  parameter  (input  match), 
can  be  recognised.  However,  the  HSPICE  simulation  is  more  close  to  original  one.  The  cause 
can  be  found  in  taking  into  account  the  discontinuities  in  MDS  (neglecting  them  in  HSPICE), 
moreover  in  small  differences  between  the  libraries  (there  are  prescribed  inductance  values  in 
HSPICE  used,  which  results  in  non-integer  quarter-turn  number,  but  MDS  library  cannot 
accept  them). 

When  both  our  results  are  compared  with  the  original  one  in  Fig.  3,  more  similar  shape 
for  HSPICE  simulation  results  can  be  seen,  S21  peak  value  is  quite  equal,  SI  1  as  discussed 
above,  however,  the  bandwidths  differ  significantly.  The  gain  rising  edge  is  quite  the  same,  but 
the  falling  one  is  shifted  down  by  nearly  1GHz.  The  main  reason  responsible  for  this  narrowing 
can  be  found  in  using  different  MESFETs  geometry.  Their  total  width  is  the  same,  however,  in 

[3]  they  use  8x75pm,  but  in  our  work  6x100pm  is  used.  Taking  this  different  geometry  into 
account,  we  can  conclude  that  the  HSPICE  with  our  F20  library  can  be  used  for  MMIC  design 
and  optimisation. 


Conclusion 

The  use  of  HSPICE  for  GaAs  MMIC  design  and  optimisation  is  described. 
HSPICE  F20  library  was  verified  using  system  approach  and  foundry  data  as  well.  There  was 
shown  that  HSPICE  F20  Library  can  be  used  for  MMIC  design,  especially  for  educational 
purposes. 


This  work  was  accomplished  in  the  frame  of  the  TEMPUS  program,  JEP  1565  and  the 
project  No.  1/4219/97  of  the  SGA. 
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Abstract 

This  work  deals  with  intermodulation  distortion  (IMD)  simulation  and 
investigation  of  a  multi-channel  high-speed  optical  receiver  including 
travelling  wave  amplifier  (TWA)  and  metal- semiconductor-metal  (MSM) 
photodetector.  The  final  simulated  parameters  of  optical  receiver  were 
following:  Frequency  bandwidth  19  GHz/1.3  dB,  Transimpedance  41  dB Cl, 
Large-signal  gain  9.7dB  and  Suppression  of  spurious  products  is  near  40  dB. 

Introduction 

In  the  age  of  information  technology,  huge  information  transfer  capacity  is  required. 
For  effective  communication  a  multi-channel,  high-speed  optical  communication  system 
could  be  used  with  advantage.  For  this  purpose,  optical  receiver  with  travelling  wave 
amplifier  (TWA)  [1]  is  suitable,  because  of  its  extreme  wide  bandwidth.  This  type  of  optical 
receiver  has  been  designed  and  optimized  using  HSPICE  [2],  [3].  Metal-semiconductor-metal 
(MSM)  photodetector  [4]  and  microwave  MESFET’s  [5]  were  used  as  active  devices. 
Components  employed  in  this  design  except  MSM  photodetector  are  from  the  GEC  -  Marconi 
Foundry,  Process  F20  technology  [5].  For  multi-channel  transmission  system  with  wideband 
optical  receiver  an  investigation  of  intermodulation  distortion  and  rejection  of  harmonics  is 
needed,  especially  for  the  2nd  and  3rd  order  IMD  products.  It  could  be  accomplished  by  fast 
Fourier  transformation  from  time  domain  simulation  using  HSPICE. 

Optical  receiver  design  and  simulation 

For  multi-channel  high-speed  optical  receiver  wide  bandwidth  amplifier  is  necessaiy^. 
The  travelling  wave  amplifier  (TWA)  for  its  extreme  bandwidth  was  designed.  Circuit 
diagram  of  TWA  in  our  design  is  shown  in  Fig.l. 


Fig.  1 

Circuit  diagram 
of  travelling  wave 
amplifier  for  optical 
receiver 
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In  construction  type  and  bias  point  of  MESFET’s  constrains  the  bandwidth  of  optical 
receiver.  Simulated  transimpedance  Zr  frequency  response  of  designed  optical  receiver  with 
TWA  and  MSM  photodetector  is  shown  in  Fig.  2.  In  our  case  we  have  achieved  19  GHz 
bandwidth.  Linearity  of  the  amplifier  at  5.045  GHz  is  visible  from  Fig.  3. 


Fig.  2 

Transimpedance  Zr 
frequency  response 


Fig.  3 

Power  transfer 
function 


The  final  simulated  parameters  of  our  optical  receiver  design  are  following:  Bandwidth 
19  GHz/1.3  dB,  Transimpedance  41  dBft,  Large  signal  gain  9.7  dB,  and  1  dB  gain 
compression  point  9.7  dBm  at  5.045  GHz. 

Investigation  of  intermodulation  distortion 

Each  of  the  non-linear  devices  can  generate  intermodulation  distortion  (IMD).  This  is 
defined  as  production  of  new  spurious  output  signals,  created  from  non-linear  combination  of 
two  or  more  input  signals  mixing  together.  An  intermodulation  product  depends  on  the 
number  of  input  signals  mixed  together  and  which  harmonics  of  those  input  signals  have  been 
mixed  together  [6]. 

For  two  input  signal  frequencies  spurious  signals  are  defined  as  follows 

fs=±Mfl±Nf2  (1) 

Where 

fs  -  frequency  of  spurious  signal 
M,  N  >  1  -  coefficients 
fi  -  frequency  of  1st  signal 
f2  -  frequency  of  2nd  signal 
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When  M  =  N  =  1,  the  spurious  responses  are  called  second-order  intermodulation 
distortion  products  and  f3  and  U  (Fig.  4)  are  defined  as  follows 


f3=f2“fl 

f4=f2  +  fi 


(2) 


When  M  +  N  =  3,  the  spurious  responses  are  called  third-order  intermodulation 
distortion  products  and  example  f5  and  fe  (Fig.  4)  are  defined  similarly 


f5=2ft-f2 
f6  =2f2  -f, 


(3) 


The  case  of  two  input  signals  (two-tone  test)  is  seen  in  Fig.  4. 


Output  power  [  dBm  ] 


Frequency  (GHz] 


Fig.  4 

Spectrum  of  the  two 
fundamental  signals  fi  and 
f2,  2nd  order  IMD  products 
f3  and  U  and  3rd  order  IMD 
products  fs  and  U 


For  a  multi-channel  transmission  a  large  number  of  frequencies  corresponding  to  each 
channel  are  present  at  the  input  of  transmission  chain.  Due  to  the  non-linearity  of  devices 
great  number  of  spurious  products  is  created,  therefore  the  exact  analysis  is  too  difficult. 
However,  simulation  tools  could  help  us  significantly  to  overcome  this  problem.  In  our  case 
we  have  investigated  multi-channel  high-speed  optical  receiver  with  TWA  andMSM 
photodetector  using  the  32-carrier  phase-aligned  matrix  generator.  The  frequency  range 
of  generator  is  from  0.825  GHz  to  5.885  GHz  with  140  MHz  spacing  between  channels, 
except  channels  that  are  switched  off  from  this  interval.  The  current  amplitude  is  31.25  pA 
per  channel.  This  corresponds  to  5  mW  of  input  optical  power  incident  to  MSM  photodetector 
with  sensitivity  0.2  AW'1  at  5  V  bias.  An  investigation  of  spurious  products  has  been  made 
using  HSPICE,  Input  and  output  signal  time  response  was  transformed  by  fast  Fourier 
transformation  to  frequency  response.  The  spectrum  of  32-carrier  phase-aligned  matrix 
generator  (Vsource)  is  displayed  in  the  bottom  part  of  Fig.  5.  The  simulated  spectrum  of  input 
(Vin)  and  output  (Vout)  signal  of  optical  receiver  is  shown  in  the  top  part  of  Fig.  5.  The  detail 
of  these  spectrums  is  shown  in  Fig.  6. 
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Fig.  5 

Spectrum  of  input 
(Vin)  and  output 
(Vout)  signal  and 
spectrum  of  the  32 
carriers  phase- 
aligned  matrix 
generator  (Vsource) 


Fig.  6 

Detail  of  spectrum 
of  input  (Vin)  and 
output  (Vout)  signal 

Rejection  of  the  harmonics,  2nd  and  3rd  order  IMD  products  are  illustrated  in  Fig.  5 
and  detailed  spectrum  in  the  range  of  2  GHz  in  Fig.  6.  Contribution  of  amplifier  transfer 
function  nonlinearity  on  EMD  is  2.5  dB  and  it's  shown  in  Fig.  6  and  the  rejection  of  IMD  is 
clearly  seen  to  be  near  40  dB.  They  are  generated  on  non-linear  input  of  MESFETs. 

Conclusion 

High-speed  optical  receiver  employing  travelling  wave  amplifier  has  been  simulated 
and  investigated  in  the  frequency  band  0.1-20  GHz  to  find  intermodulation  distortion  (IMD) 
as  a  veiy  important  parameter  for  multichannel  transmission  systems.  The  1  dB  gain 
compression  point  was  found  equal  to  9.7  dBm  of  output  power  near  5  GHz.  From  analyze 
simulated  spectrum  we  found  that  spurious  products  originate  mainly  from  input  non-linearity 
of  transistors  and  their  suppression  was  found  to  be  near  40  dB  in  full  frequency  band. 
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Abstract  A  customisable  nodes’  library,  a  special  VHDL-model, 
programmed  electronic  mock-up,  modem  universal  and  original  software- 
hardware  means  are  necessary  components  of  instrumental  tools  for  their 
computer-aided  designing.  In  the  paper  the  computer-aided  design  tools  and 
main  characteristics  of  single-purpose  microcontrollers  NT80XX  family  have 
been  given.  Described  tools  have  been  used  during  the  design  of  the  large 
number  of  different  microsystems. 


L  Introduction 

Microcontrollers’  integrated  circuits  got  broad  spreading  at  the  end  70-th  years  [1]. 
Originally,  these  chips  had  universal  nature.  Other  components  of  different  microprocessors' 
systems  determined  a  specific  character  of  their  using.  However,  on  the  measure  of  their 
complication  they  transformed  in  typical  example  of  the  programming  application  specific 
integrated  circuits  (ASIC). 

At  present  it  is  observed  generalisation  of  experience  of  their  development  and  using.  On 
this  base  microcontrollers’  families  are  formed,  and  also  efficient  computer-aided  design  tools 
are  created.  Such  families  are  oriented  on  the  building  of  single-purpose  systems  of  determined 
class.  A  customisable  nodes’  library,  a  special  VHDL-model,  a  programmed  electronic  mock- 
up,  a  modem  universal  and  original  software-hardware  means  are  necessary  components  of 
instrumental  tools  for  their  computer-  aided  designing. 

Below  the  main  characteristics'  features  of  NT80XX  microcontrollers’  family,  the 
composition  of  customisable  cells’  library,  the  using  VHDL  and  PLIC  models,  the  employing 
software  has  been  described. 


II.  Instrumental  tools  for  microcontrollers  and  microsystems  design 

We  will  introduce  the  instrumental  tools  on  example  of  microcontroller  of  NT80XX 
family. 

The  existing  tools  of  CAD  allow  executing  following  types  of  customs: 

•  design  and  delivery  of  microsystems  with  using  the  having  or  a  new  variant  of 
microcontroller  of  NT80XX  family; 
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•  programming  and  foundry  the  finished  microcontroller  of  NT80XX  family  with  primary 
delivery  of  VHDL-  and/or  PLIC-  model; 

•  design,  programming  and  foundry  a  new  microcontroller  of  NT80XX  family  with  primary 
delivery  of  VHDL-  and/or  PLIC-  model; 

•  delivery  of  VHDL-  and/or  PLIC-  model  of  having  microcontroller  of  NT80XX  family. 

The  MENTOR  GRAPHICS(Design  Architect,  AccuSim,  QuickSim  II,  IC  Station  ), 
XELINX(Foundation),  ACCEL(PCAD)  companies  and  original  software  [3]  are  used  during 
design  of  microcontrollers  and  microsystems. 

Design  process  flow  of  these  microcontrollers  is  typical  route  of  CAD  of  Application 
Specific  Integrated  Circuits  (ASIC)  [4].  It  contains  following  design  stages: 

•  design  capture; 

•  VHDL  description; 

•  behavioural  simulation; 

•  logic  synthesis; 

•  gate-level  simulation; 

•  PLIC-model  synthesis; 

•  layout  synthesis; 

•  foundry  and  delivery. 

High  speed  and  quality  of  design  are  secured  by  using  of  special  microcontrollers’  model. 
In  first  order  it  contains  VHDL-description. 

The  using  VHDL-model  is  universal  description,  adjusted  to  architecture  of  any  single¬ 
purpose  microcontroller  of  NT80XX  family.  The  chip-level  description  [5]  was  chosen  during 
its  writing.  The  model  has  following  distinctive  features. 

Description  of  microcontroller  behaviour  is  “architectural  body  .  Corresponding 
«processes»  included  in  model  reflect  behaviour  of  each  of  general  hardware  blocks.  The  NT- 


Fig.l.  Graph  of  NT-model  processes 
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A  description  of  transformation  algorithm  has  an  abstract  nature.  For  account  of  delay 
the  result  is  transported  by  final  operator  (operators)  to  output  signal  (signals).  Adequacy  of 
blocks’  interaction  is  ensured  by  “bus  commutation  process”.  The  model  “standard  package” 
contains  the  type  descriptions,  parameters'  values,  subroutines  and  etc.  Change  of  descriptions 
of  “standard  package”  and  “bus  commutation  process”  allows  tuning  of  the  VHDL-model  for 
necessary  microcontroller  architecture.  Debugging  microsystems  program  can  be  introduced 
into  VHDL-model  in  bit  or  assembler  form  (fig.2.).  Therefore,  the  use  described  VHDL-model 
ensures  required  any  creating  specialised  microsystems  accuracy  and  high  velocity  of 
modelling. 


CalcCRC:  [--rp] 

b  =  [sp++] 
t  =  b 
b  =  0 
[~sp]  =  b 
BuffCycle:  [-rp] 

al=[t++] 
work  =  al 


/*  store  return  addess 
/*  bh=[sp++],  bl=[sp++] 
/*  T  buffer  addess 
/*  bh=0,  bl=0 
/*  CRC  beginnig  value 
/*  byte  cycle 

/*  next  byte 


Fig.2.  Example  of  assembler  form  program 

Concrete  technological  and  circuit's  realisation  of  blocks  (processes)  is  generated  on  the 
next  stages  of  design  flow.  Under  it  the  layout  of  blocks  and  whole  source  symbolic  plan- 
model  of  chip  are  tuned.  In  the  course  of  designing  one-to-one  correspondence  between 
described  the  VHDL-model  and  microcontroller  PLIC-model  is  ensured  too. 


m.  Main  characteristics  of  NT80XX  family 

NT80XX  family  is  set  of  application  specific  CMOS  microcontrollers.  These  chips  are 
oriented  to  the  solution  of  control  and  data  acquisition  tasks  in  independent  from  the  AC/DC 
sources'  systems  with  low  level  of  power  consumption.  Now  the  family  contains  few  finished 
chips  programming  by  customer  [2],  They  are  broadly  using  in  different  microsystems.  This  set 
may  be  quickly  completed  according  to  customer  wishes  too. 

Architecture  of  new  microcontroller  of  this  series  can  include  the  following  kit  of  tuning 
blocks: 

1.  Dual  stack  RISC  microprocessor  core  includes  ALU,  set  of  basic  registers  and  command 
decoder.  It  allows  performance  of  main  commands  for  one  processor  clock.  Five  addressing  modes 
can  be  used  to  memory  reference.  The  set  of  commands  includes  following  groups:  data  transfer 
instruction;  arithmetic  and  logical  operations;  branch  instruction;  group  of  special  instruction.  This 
block  contains  two  stacks  for  realisation  of  subroutine  call  mechanism:  a  data  stack  and  return  stack. 

2.  Serial  interface  is  a  half-duplex  asynchronous. 

3.  Kit  of  operating  timers  ensures  a  possibility  of  automatically  rebooting. 

4.  The  watchdog  timer  checks  the  program  performing. 

5.  The  input/output  ports  are  programming. 

6.  The  internal  ROM  is  programming  by  mask. 
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7.  The  internal  RAM  is  static. 

8.  The  electrically  erasable  reprogrammable  ROM  (EEPROM)  allows  to  store  most  important 
data  during  power  off. 

Architecture  of  any  microcontroller  allows  programming  the  character  of  using 
(internaJ/extemal)  of  memory  area.  Customer  may  elaborate  the  range  and  space  of  the  each  of 
described  blocks. 

The  microcontroller  NT8020  has  the  following  characteristics: 

■  Supply  voltage... . 2.7  -r3.9  V 

■  Supply  current  at  stop  model. 5  \iA  @  3  V 

■  Supply  current  at  low-speed  operation  10  pA  @  3  V 

■  Supply  current  at  high-speed  operation  1  mA  @  3  V 

■  Internal  clock  frequency  at  Xin  =  4  MHz  2  MHz 

■  Internal  clock  frequency  at  Xcin  =  32  kHz  16  kHz 

■  Internal  ROM  size ...  16K  bytes. 

The  kit  of  accessible  hardware  blocks  is  extended  constantly. 


IV.  Final  Remarks 

The  using  of  the  described  tools  gives  possibilities  for  high  speed  design  of  the  ASICs  in 
a  short  time.  Orders  are  executed  together  with  “High  Technology  -  Scientific  Centre”  of 
Byelorussian  State  University  and  "NT-Laboratory"  enterprises.  They  had  used  for  design  of 
several  tens  of  domestic  and  foreign  microsystems  [6].  At  present  the  design  of  new  NT8030 
microcontroller  is  completed.  Improvement  of  computer-aided  design  tools  is  be  realised 
constantly. 
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Abstract:  The  specific  design  language  properties  of  VHDL  allow  the  designer  to  use  the  routines 
results  in  average  two  times  improvement  in  the  logic  capacity  of  the  programmable  devices.  The 
CPLDs  and  FPGAs  architectures  assume  the  technology  specific  optimisation  techniques,  including 
algorithms  for  state  machine  and  glue  logic,  and  module  generation  for  data  path  and  arithmetic 
logic,  to  take  maximum  advantage  of  unique  architectures  for  significant  speed  and  area  reductions. 

I.  Introduction 

The  paper  is  dedicated  to  description  of  circuit  implementations  of  encoders  and  decoders  of  error-control 
codes.  To  be  technology  independent,  the  descriptions  are  done  in  VHSIC  Hardware  Description  Language 
(VHDL),  which  becomes  the  universal  description  mean  of  digital  circuits.  The  hierarchical  structure  of 
VHDL  is  ideally  suited  for  description  of  extensive  electronic  circuits  and  systems,  which  are  needed  with 
the  requirement  of  high  speed  of  communication.  Other  aspects  of  applications,  which  are  not  visible:  the 
support  of  testing  of  circuits  and  systems. 

The  design  methods  must  be  versatile  and  technology  independent  for  of  CPLDs  (Complex  Programmable 
Logic  Devices),  FPGAs  (Field  Programmable  Gate  Arrays),  and  CMOS  ASIC  design.  Students  and 
designers  can  quickly  efficiency,  and  economically  consolidate  multiple  designs  into  one  larger  design, 
retarget  a  design,  and  use  VHDL  to  accomplish  their  designs.  The  programme  packages  can  optimise  the 
designs  for  area  and/or  speed.  The  VHDL  accepts  designs  described  as  equations,  truth  table  descriptions 
or  interconnection  descriptions. 

II.  Example  of  Error-Control  Architecture  Design 

It  is  introduced  an  example  of  VHDL  models  of  the  both  encoder  of  Systematic  Cyclic  Hamming  (15,11)- 
code  with  generating  polynomial  g(x)  =  x4  +  x  + 1  and  Meggit  decoder  of  this  code.  It  is  used  a  permitted 
error  vector  to  try  the  error  correction  of  Meggit  decoder  model.  The  basic  part  of  systematic  (15,1  l)-code 
encoder  uses  the  circuit  for  dividing  by  generating  polynomial  £(*)  =  x4  +  x  + 1 .  The  block  diagram  of 
this  circuit  is  as  follows: 


Figure  1:  Block  diagram  of  (I5,ll)~code  encoder  circuit 
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The  circuit  on  the  Figure  1  consist  of  the  Linear  Feedback  Shift  Register  (LSFR),  which  is  in  fiinction  for 
the  first  eleven  steps  of  clock.  To  complete  the  output  code  word  the  remainder  is  affixed  to  the  information 
bits.  Due  to  it  is  for  the  last  four  steps  changed  address  of  multiplexers.  It  causes,  that  the  remainder  is 
shifted  through  the  ‘lower”  multiplexer  to  the  output.  During  next  four  steps  it  is  completed  the  output  code 
word.  The  block  of  address  counter  must  be  configured  to  be  possible  use  the  address  output  A  tor 
addressing  the  both  “lower”  and  “upper”  multiplexers. 

After  finishing  the  code  word,  the  encoder  is  prepared  to  generate  another  output  code  word,  if  the  address 
counter  is  initialised.  That  is,  it  must  do  the  multiplexers  to  address  the  first  eleven  steps  of  next  fifteen 
cycle  code  word  generating.  The  reset  signal  can  be  used  for  this  initialisation.  The  initialisation  of  the 
LSFR  is  not  necessary,  because  it  was  feed  the  value  “0”  through  the  “upper”  multiplexer.  The  value  0  m 
the  feedback  wire  is  necessary  to  change  the  LSFR  into  usual  shift  register.  The  block  diagram  of  (  ,  )- 
code  encoder  circuit  is  described  to  do  the  encoder  circuit  model  easier. 

HI.  Encoder  Circuit  Model  Composition 

The  first  step  of  modelling  is  description  of  behavioural  models  of  elementary  parts  of  circuit.  There  are 
defined  entities  of  flip-flop,  EX-OR  gate,  multiplexer  and  counter: 

—  description  of  overall  structural  model 

entity  encoder  is 

port  (  input_word,  clock,  reset  :  in  bit; 

output_code_word  :  out  bit) ; 
end  encoder; 

architecture  encod_ll_15  of  encoder  is 
const  ground  :  =  'O'; 
component  ff_f 

port  (d,c,r  ;  in  bit  ; 
q  :  out  bit); 
end  component; 

component  Ex_or 

port  {  Xnl, In2  :  in  bit; 

Outl  :  out  bit) ; 
end  component; 

component  mux_2_l 

port  (  10,  II,  a  :  in  bit; 
y  :  out  bit) ; 
end  component; 

component  counter 

port  (  cl,  res  :  in  bit; 
out_a  :  out  )  ; 
end  component; 

begin  . 

pi:  mux_2_l  port  map  (input__word,  oxor2,  count_out, 

output  code_word)  ; 

p2 :  mux  2  1  port  map  (oxor2,  ground,  count_out,  fb) ; 

p3 :  Ex_or~  port  map  (fb,  ql,  oxorl); 

p4 :  Ex__or  port  map  (q4,  input_word,  d2); 
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p5 :  ff_f 
p6 :  ff_f 
p7 :  ff_f 
p8 :  ff_f 
p9:  counter 
end  encod  11  15; 


port  map  (fb,  clock,  reset,  ql); 
port  map  (d2,  clock,  reset,  q2); 
port  map  (q2,  clock,  reset,  q3); 
port  map  (q3,  clock,  reset,  q4); 
port  map  (clock,  reset,  count_out) ; 


The  model  of  cyclic  Hamming  (15,ll)-code  with  generating  polynomial  g(jc)  =  xA  +  x  +  1  is  written  in 
VHDL  language.  The  overall  circuit  is  modelled  by  structural  model.  It  is  possible  to  add  some  delays  in 
the  model  description  to  enhance  the  readability  of  output  signal  waves  during  the  simulation. 


IV.  Decoder  Circuit  Description 

The  construction  of  Meggit  decoder  is  based  on  the  property  of  the  cyclic  codes:  We  can  concentrate  on  the 
last  position  of  each  received  word  w,  which  vve  correct  or  not,  according  this  syndrome.  Then  we  make  a 
cyclic  shift  of  code  word  w,  and  again  study  the  last  position,  etc.  After  n  cyclic  shifts,  all  positions  will 
have  been  corrected.  For  the  generating  polynomial  g(x)  =  xA  4-  x  + 1  wc  can  compose  syndrome 
computing  circuit  for  the  systematic  cyclic  Hamming  (15,1  l)-code. 

To  correct  an  error  we  will  use  the  syndrome  value  0001.  To  be  the  received  word  corrected  it  must  be 
delayed  15  steps.  This  delay  is  realised  by  the  shift  register  with  15  flip-flops.  For  generating  an  correcting 
bit  it  is  used  the  circuit  completed  by  two  inverters  and  one  four  input  gate  with  AND  function  and 
concatenated  gate  with  EX-OR  function. 


Figure  2:  Block  diagram  of  (15,ll)-code  decoder  circuit 


V.  The  Overall  Model  of  Decoder  Circuit 

The  modelling  is  defined  by  description  of  behavioural  models  of  elementary  parts  of  circuit.  There 
are  defined  entities  of  flip-flop,  EX-OR  gate,  inverter,  and  AND-gate: 
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entity  haradecl5ol  is 

port  (a,  cl,  rr:  in  bit; 

b  :  out  bit) ; 

end  hamdeclSol; 


architecture  hd!5a  of  hamdecl5ol  is 
component  Ex_or 
port  (  Ini,  In2  :  in  bit; 

Outl  :  out  bit) ; 
end  component; 
component  ShiftRegl4 
port  (ini  ;  in  bit; 

cl,r:  in  bit; 
outl:  out  bit); 
end  component; 
component  And4 

port  (  ini,  in2,  in3,  in4 :  in  bit; 
outl  :  out  bit) ; 
end  component; 
component  Divider 15 
port  (a,  cl,  rr:  in  bit; 

ml, m2, m3, m4:  out  bit); 
end  component; 

signal  qql, qq2, qq3, qq4 , qq5  :  bit; 
signal  err:  bit; 


begin 

n2 :  Dividerl5 
n5:  And4 
n6:  ShiftRegl4 
n7 :  Ex_or 
end  hdl5a; 


port  map  (a, cl, rr,qql, qq2, qq3,qq4) ; 
port  map  (qql, qq2, qq3, qq4,  err)  ; 
port  map  (a, cl, rr, qq5) 
port  map  (err,qq5,b); 


VI.  Conclusion 

The  Meggit  decoder  is  realised  as  an  structural  model  composed  from  behavioural  models  of 
components.  Some  components  “Divider  15”,  “And4”  and  “ShiftRegl4”  are  modelled  by 
structural  models  too.  The  model  was  verified  by  functional  simulation  using  the  software  suite  V- 
System.  Decoder  corrects  one  error  when  two  errors  occurs  decoder  detect  error  but  it  cannot  be 
corrected. 

Support  for  GACR  project  " Research  and  Development  of  Built-in  Diagnostics  Means  of  Integrated 
Circuits  "  (No.  102/98/1003)  is  gratefully  acknowledged. 
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Abstract.  In  this  paper  partial  domain  method  for  the  analysis  of 
various  homogeneous  and  non-homogeneous  shielded  microstrip  lines 
is  presented.  This  method  determinate  electromagnetic  field 
configuration  taking  into  account  interface  between  two  media  and 
frontier  conditions. 


I.  Introduction 

The  formulation  for  this  problem  has  two  parts:  first  of  all  it  has  to  pass  from  the  real 
object  to  the  physical  model  and  then  it  has  to  make  the  mathematical  modelling  of  the 
adopted  physical  model.  This  method  affords  achievement  of  some  efficiently  computing 
algorithms  for  electromagnetic  field,  that  take  into  account  of  various  geometrical 
homogeneous  and  non-homogeneous  parts  of  shielded  lines  (figure  1). 

Having  into  account  the 
transversal  section  of  the  shielded 
microstrip  line  it  was  created  a  model 
(figure  1),  where  interface  between  1 
and  2  domains  coincide  with  Ox  axis, 
the  origin  of  the  axis  is  next  to  the  strip 
edge  and  for  x~l,  we  have  a  electric 

wall. 

Figure  1.  Model  for  the  shielded  microstrip  line 

II.  Partial  Domain  Method 

In  order  to  determine  the  propagation  parameters  in  this  guiding  structure,  one  must 
solve  some  partial  differential  equations,  like  the  Helmholtz  equation: 

A®  +  £20  =  0,  (1) 

which  is  the  general  equation  for  the  longitudinal  field  components  (Ez  and  Hz).  Solving  in  the 
domain  represented  by  the  transverse  section  through  the  waveguide  these  equations,  with 
consideration  of  the  appropriate  bordering  conditions,  one  gets  the  field  structures  inside  the 
waveguide  and  hence  the  desired  propagation  parameters. 

Proceeding  from  the  Meixner  model  [2],  we  have  the  condition  in  point  x=0: 
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{9„}.{Vn}=0(xa”),forx^O, 

where:  oco=-1+to,  for  the  tangential  electromagnetic  field  components  at  the  interface  between 
media  (Ex,  Ey,  Hx,  §i  Hy); 

a0=T0,  for  the  longitudinal  electromagnetic  field  components; 

t0  is  the  minimum  positive  solution  of  the  characteristics  equation  that  is  bom  after  the 
introduction  of  the  proposal  solution  in  the  Maxwell  equations  [2], 

After  the  checking  of  the  chosen  function  system  orthogonality,  we  might  use  the 
following  expressions  for  the  approximation  of  the  electromagnetic  field  components  Ex,  Ez, 
Hx  and  Hz  at  the  strip  edge  and  at  the  interface  between  two  media: 

M  =  T  1,/  7  r2»("W)  '•  (2) 


1  -u‘{x 


<P*.(*)  =  rT»Mx)) 


M7  hn  (*)  =  Uln+\  (W(*))  »  ^ 

where  T2n  and  T2n+i  are  first  rang  and  two  order  Chebyshef  polynoms; 

U2n  and  U2n+i  are  two  rang  and  two  order  Chebyshef  functions; 

/  x  ,  2x-w  [w  ^Lnnl 

w(x)  =  l  + - ,  u:  -»[l,0j. 

w-a  \_2  2} 

Then  the  continuity  relations  are  written  for  the  determination  of  the  unknown 
coefficients  from  the  Fourier  approximation.  Having  into  account  the  Helmholtz  solution 
equations  and  orthogonal  function  properties  we  obtain  linear  algebraic  equation  system  [8]. 

ic&  +  td*D'=0,  (7) 

«=0  n- 0 

±ch,C„  +  ±dhlD„=  0  (8) 

where:c*=^P“A'£  KM’ 

,  P  v,  ,  ^(-1)"  Wfc.O'i) 


Yh'M’ 


'  _  P  V  ?  V  (-1)5  jVM 

c^T0V^"'akmh  n*  K,h)’ 
,■  _J_y„  „  ^(-06xl 

^kn  ,  2^^mnakm  Zj  »  2 


5=1  Yehm  (^() 


Xem(x) 


i .  /  _  /,  .  *,2  _  £.20  m  —  ir^  ■  n  fv  h.  and  F  are  the  Fourier 

em  =  —flm  —  ,  v  —  Kxm  ,  Xsm  “  Ka  SrsMv5  Kxm  >  Ukm  >  u»«  »  Ukm  hmn 

Xnm\x) 


coefficients. 

For  having  the  solutions  for  the  algebraic  system  it  is  necessary  to  determinate  the 
propagation  coefficients  value,  p  that  check  the  “dispersive  equation”  [8]: 
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det 


=  0. 


(9) 


Fourier  series  from  the  infinite  equation  system  are  replaced  after  the  analysis  of  them 
convergence  with  finite  partial  sums.  The  propagation  coefficient  values  that  check  the 
equation  (9)  will  decided  the  configuration  of  the  hybrid  propagation  modes  from  the  shielded 
microstrip  line. 


Figure  2.  The  variation  of  the  Fourier  coefficient  values  for  n-k-1^40  and  m—1+40 


For  the  facility  of  the  equation  (9)  solving  it  will  be  considerate  n^k.  The  variation  of 
the  Fourier  coefficient  values  akm,  amn,  bkm  and  %mn ,  obtained  with  Matlab  program,  are 
presented  in  figure  2  for  n-k- 1+40  and  m=R40. 


III.  Results.  Final  remarks 

It  will  de  considerate  a  real  structure  of  the  shielded  microstrip  line  (w=lxlO'3  [m], 
a=3,5x  10'3  [m],  yi=0,5x  1(T3  [m],  y2=2x  10‘3  [m]  and  sr2=9). 

The  propagation  coefficient  values  that  decide  the  configuration  of  the  hybrid 
propagation  modes  are  also  computed  with  Matlab  program.  The  minimum  frequency  where 
the  propagation  is  made  without  loss  is  equal  with  38.188  GHz. 

The  longitudinal  component  configurations  of  the  electric  and  magnetic  field  at  the 
38.188  GHz  are  presented  in  the  figure  3.  The  transversal  components  determination  will  be 
made  with  longitudinal  components  derivation  expressions. 
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Figure  3.  The  variation  of  the  longitudinal  components  of  the  electric  and  magnetic 
field  in  network  points 
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Abstract.  The  chemical  sensors  are  recently  designed  also  by  Thick  Film 
Technology  (TFT).  The  experience  with  preparation  of  novel  chemical  sensors 
based  on  conductivity  and  biosensor  principle  are  described. 

The  special  emphasize  has  been  put  to  the  design  of  gas  sensors  with  respect 
to  production  of  suitable  sensing  parts  for  automotive  industry. 

The  sensors  for  special  high-precise  analyses  could  be  prepared  also  by 
TFT,  the  main  results  obtained  at  detection  of  drugs,  poisoning  substances  and 
enzymes  inhibitors  by  advanced  chamber  thick-film  substrate  RING  4  are 
mentioned.  The  testing  of  presence  of  drugs  used  in  medicine  can  be  seen  as  a 
new  area  of  the  usage  of  thick  film  chemical  sensors.  The  implementation  of  such 
sensors  to  the  market  cannot  be  expected  in  the  very  near  future  and  the 
possibilities  of  their  usage  and  the  market  estimation  will  be  disscussed. 


Keywords:  thick  film  technology,  electrochemical  sensor,  thick  film  sensor,  gas 
sensing,  chamber  thick-film  substrate 


I.  Introduction 

The  detection  of  dissolved  species  presents  an  important  analytical  measurement  in  a 
wide  variety  of  industries.  For  example,  the  food  and  chemical  industries  require 
electrochemical  sensors  for  dissolved  species  to  control  reaction  rates  occuring  within  process 
tanks  through  of  monitoring  of  reactants  and  products,  the  medicine  requires  the  frequent 
analysis  of  body  fluids,  environmental  engineering  and  military  provide  routine  analyses  of 
poisoning  substances  and  polutants.  The  estimation  of  composition  of  various  gases  is 
valuable  for  many  industrial  branches,  environment  monitoring,  medicine  and  military. 

Recently,  the  Thick  Film  Technology  (TFT)  has  been  used  also  to  design 
electrochemical  sensors,  chemical  sensors  and  biosensors.  Advantages  of  TFT  can  be  seen 
especially  in  low  cost  production  of  small/middle  amount  of  sensors  sufficiently  adapted  to 
market  requirements  at  acceptable  design  loop  duration. 
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Nowadays  one  can  distinguish  two  principal  groups  between  chemical  and 
electrochemical  Thick  Film  Sensors  (TFS).  The  first  one  is  presented  by  sensors  based  on  the 
measurement  of  conductivity  among  usually  interdigitated  electrodes  and  the  second  one 
based  on  another  (amperometric,  potentiometric,  ..)  measuring  principle  realized  in  elective 
electrode  system  (two-,  three-electrode  systems,  various  reference  electrodes,  enzymatic 
electrodes,  etc.)  Consequently  the  TFT  enables  to  prepare  integrated  sensors,  arrays  of  sensors 
and  sensors  equipped  by  electronic  components. 

The  estimations  for  sensors  market  presume  the  remarkable  increase  of  demand  in  so 
called  typical  applications:  industrial  machines,  process  control,  automotive,  security, 
communication  and  telemetry.  The  possibilities  to  place  well  the  electrochemical  TFS  in  the 
market  are  very  uncertain  to  predict  in  global  scale  but  last  results  in  the  research  already 
show  the  good  chance  to  start  industrial  production  in  sufficiently  short  time  horizon. 

II.  Theory 

The  sensors  are  transducers  converting  measured  quantity  into  a  signal.  When 
designed  like  TFS,  the  procedures  of  screen  printing  and  firing  are  employed.  The  overview  of 
possible  TFS’  types  has  been  summarized  i.e.  in  [1],  [2]. 

In  most  cases  the  conductometric  gas  chemosensors  make  use  of  the  chemirestor  fixed 
on  interdigitated  electrodes.  The  chemiresistor,  most  frequently  made  from  organic  material, 
changes  its  conductivity  in  the  presence  of  suitable  chemical  conpounds,  i.e. 
reducing/oxidising  gas.  Gas  chemisensor  operate  in  elevated  temperatures,  up  to  600K.  The 
poor  selectivity  of  the  response  to  given  gas  in  gas  mixture  belongs  usually  to  principal 
problems  of  chemisensors.  The  metal-phtalocyanines  (CuPC,  FePC,  CoPC,  ZnPC,  GaPC,  ..) 
present  a  kind  of  chemical  compounds  frequently  used  like  chemiresistors. 

The  problem  with  unsufficient  selectivity,  as  in  gas  chemical  sensors,  can  be  treated  by 
the  use  of  more  selective  sensing  principle,  as  shown  in  biosensors  [2],  [3].  The  biosensor  is 
the  type  of  chemisensor  which  uses  biologically  sensitive  material  to  detect  chemical  species, 
enzymes,  tissue  patterns  and  microbial  cells  can  be  used  like  sensitive  part  of  biosensor.  Also 
biosensors  as  very  specific  type  of  chemosensors  are  designed  like  TFS.  Main  disadvantage  of 
biosensors  can  be  seen  in  easy  vulnerability  of  layers  made  from  biological  material. 

III.  Conductometric  Gas  Sensor  for  Automotive  Industry 

The  detection  of  polutants  contained  in  exthaust  gases  is  expected  to  be  required  in  the 
near  future  and  in  fact  this  means  the  design  of  reliable  sensors  for  analyses  of  nitrogen  oxides 
(NOx),  methane  (CH4),  carbon  oxide  (CO)  and  carbon  dioxide  (C02).  At  the  use  of  metal- 
phtalocyanines  as  chemiresistor,  the  significant  aspect  of  phtalocyanine  response  which 
affects  the  functionality  of  sensors,  is  the  change  of  central  atom  of  the  material  and  the 
operating  temperature.  We  have  put  an  emphasis  to  obtain  the  good  reproducibility  of  the 
response  of  sensors  to  one  selected  gas  -  N02.  The  best  results  have  been  reached  with 
copper-phtalocyanine  (CuPC)  like  a  chemiresistor.  The  layout  of  the  used  chemisensor 
(interditated  electrodes,  temperature  sensing  element  and  heater )  on  alumina  substrate  (96  per 
cent)  and  the  molecule  of  copper-phtalocyanine  (chemiresistor)  are  described  in  figure  1. 
When  printed  copper-phtalocyanine  containing  paste  the  experience  shows  to  the  best  stability 
of  changes  of  conductivity  for  the  squegee  pressure  of  22,5PSI  (t,5kg/cm2)  and  squegee  speed 
of  lOmm/sec.  Tested  sensors  have  been  operated  at  the  temperature  of  43  3  K  and  the 
evaluation  of  response  dispersion  has  been  provided  for  the  serie  of  20  pieces  of  sensors,  see 
figure  2. 
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Fig.  1.  a)  Platinum  interdigitad  electrodes  (Heraeus  C3657)  and  temperature  sensing  element  (Heraeus 
LPA  88-11),  b)  heater  (Heraeus  C3657)  on  alumina  substrate  (96  per  cent);  c)  molecule  of 
CuPC 
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The  sensors  tested  have  shown  the 
dispersion  of  responses  at  the  level  of 
8  per  cent  usually.  The  operating 
temperature  of  433K  yields  the  faster 
response  time:  ACond  =  ACondmax  /  2 
for  t  =  6  min.,  the  full  response  can  be 
expected  within  15  minutes.  Long 
time  stability  of  responses  is  also 
acceptable  in  comparison  with 
previous  designs  operated  at  even 
lower  temperatures. 


Fig.2.  Time  response  of  CuPC  chemosensor  to  N02,  T=433K 


IV.  Drug  Evaluation  by  Thick  Film  Chemosensor  Based  on  Biosensing  Principle 

The  thick  film  substrate  RING4  (Fig.3.)  has  been  used  for  preparation  three-electrode 
biosensor  to  detect  pattern  drug  -  demecarium.  Three-electrode  system  (working,  reference 
and  auxiliary  electrode)  placed  on  two  alumina  substrates  (96%)  borders  the  flow  reaction 
cell.  The  working  electrode  made  from  platinum  paste  (Heraeus  C3657)  is  polarized  vs. 
reference  Ag/AgCl  electrode  +365mV,  the  output  signal  has  been  processed  by  the  usual 
potentiostat.  Biosensing  membrane  contains  AChE  (acetylcholine  esterase)  from 
Electrophorus  electricus  (EC  3. 1.1. 7),  A=approx.  450IU/mg;  total  activity  of  the  enzyme 
membrane  is  approx.  20IU.  Electron  transfer  to  Pt  electrode  is  realized  by  CuPC-containing 
composite  layer;  its  composition:  6  (CuPC):  89  (graphite):  5  (acetylcellulose),  modification  of 
[4]  -  carefully  screen  printed.  Reference  electrode  Ag/AgCl  has  been  prepared  by 
electrochemical  chloridization  of  silver  layer,  finally  both  layers  are  protected  by  KCl 
deposition.  Lifetime  of  reference  electrode  has  been  estimated  to  6  weeks.  AChE  inhibitor  and 
drug  demecarium  has  been  evaluated  at  following  conditions:  T-298K,  cell  flow  125pl/min, 
0,4  mmol/I  of  acetylcholine  chloride  as  reaction  substrate(Aldrich,  Mr=226.12,  pur.  99%)  in 
50mmol/l  phosphate  buffer,  time  to  reach  the  steady-state  response  for  substrate  is  about  20 
minutes,  output  current  approx.  270nA,  output  current  decrease  4nA/hour,  output  signal 
stability  in  steady-stade  at  least  for  48  hours,  the  average  time  of  inhibition  effect  of 
demecarium  was  10  hours. 
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Fig.  3.  a)  biosensor  substrate  RING4,  b)  decrease  of  Iout  with  respect  to  concentration  of  demecarium 


OTHERS 

10% 


V.  The  Market  Implementation  for  the  Future 

The  recent  prognoses  of  total  sensor  market  estimate  the  growth  up  to  65  bil  DM  in 
2001  from  49.7bil  DM  in  1996;  the  massive  increase  expects  the  automotive  industry,  about 
90  per  cent.  The  expected  distribution  of  market  around  the  world  in  2001  is  mentioned  in 
fig.4.  Nowadays  the  situation  in  novel  thick  film 
designs  gives  a  good  chance  to  expect  the  massive 
increase  also  here  but  there  is  problem  of  research 
phase  duration,  i.e.  the  development  of  thick  film 
sensor  for  glucose  diagnosis  has  lasted  about  14  years 
but  then  the  production  and  improvements  of  sensor 
went  very  fast.  So  that  we  have  to  distinguish  1)  the 
TFS  almost  suitable  for  mass  production  (gas  sensors 
for  automotive  industry)  and  2)  the  other  designs  like 
biosensors  that  will  be  introduced  to  market  after 
several  years  of  improvements.  But  in  spite  of  this  fact, 
these  designs  are  very  prospective  and  the  demands  of 
low  cost  disposable  diagnostic  stripes  (i.e.  TFS)  in  vari¬ 
ous  branches  will  rapidly  grow. 

VI.  Conclusion 
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Fig.4.  World  Sensor  Market  in  2001 


The  brief  description  of  two  prospective  designs  of  TFS  has  been  shown  for  gas  sensor 
for  N02  and  chamber  two-substrate  chemosensor  based  on  biosensing  principle  for  the 
determination  of  concentration  of  drug  demecarium  (AChE  inhibitor).  The  prospective  market 
implementation  of  such  designs  can  be  documented  by  expectations  in  global  sensor  market 
predicted  for  2001,  specific  limitations  valid  for  TFS  have  been  briefly  mentioned. 
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Abstract.  In  this  paper,  we  present  an  environment  for  synthesis 
and  simulation  of  the  industrial  digital  system  composed  of  the  target 
microprocessor,  memory  and  hardware  devices.  We  use  C++  for  coding 
the  program  for  the  target  microprocessor  and  VHDL  for  describing  the 
operations  in  hardware.  The  presented  system  combines  C++  compiler 
and  VHDL  tools  for  simulating  the  design.  The  applicability  of  the 
environment  for  performance  estimation  of  the  designed  digital  system 
is  demonstrated. 


I.  Introduction 

Today,  typical  digital  system  is  composed  of  the  specific  microprocessor,  memory  and 
hardware  environment  implemented  either  as  full  custom  ASICs  or  programmable  de¬ 
vices  [1].  Traditional  design  flowchart  of  these  systems  starts  with  the  high  level  system 
description,  hardware/software  partitioning,  hardware  and  software  implementation 
and  system  integration  into  the  target  environment.  Since  the  errors  in  software  and 
hardware  are  commonplace  and  in  most  cases  encountered  only  once  the  system  is  inte¬ 
grated  into  the  target  environment,  software  program  and  hardware  devices  are  likely 
to  be  modified.  This  is  a  difficult  and  time  consuming  step  which  may  lead  even  to  the 
modification  of  the  printed  circuit  board. 

In  contrast  to  the  traditional  system  flowchart,  HW/SW  co-design  enables  simultaneous 
design  of  hardware  devices  and  software  program  at  the  earliest  stages  of  the  design 
which  reduces  and  minimizes  any  design  errors  [2,3,4]. 

In  this  paper,  we  will  present  our  approach  for  the  HW/SW  co-simulation  of  the  tele¬ 
phone  exchange  boards  developed  and  used  in  telecommunication  industry.  We  use 
C++  for  coding  the  program  for  the  target  microprocessor  and  VHDL  for  defining 
the  operation  of  hardware  devices.  For  the  given  HW/SW  partitioning,  the  presented 
system  combines  C++  and  VHDL  tools  for  simulating  the  designed  system.  We  demon¬ 
strate  the  applicability  of  the  environment  for  estimating  the  timing  performances  of 
the  designed  digital  system. 
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Figure  1:  Digital  system  simulated. 

II.  Co-simulation  of  C++  program  and  VHDL  code 

The  main  idea  is  to  simulate  the  digital  system  composed  of  the  processor,  memory 
and  peripheral  units  (Figure  la)  on  the  PC.  The  microprocessor  memory  address  map 
consists  of  RAM,  ROM,  FLASH  and  registers  of  the  peripheral  units  implemented  in 
FPGA  devices  (Figure  lb).  The  simulation  model  of  the  digital  system  from  Figure 
lb  is  presented  in  Figure  lc.  The  behaviour  of  the  microprocessor  and  the  memory 
is  described  with  the  C++  program  while  the  operations  in  hardware  (FPGAs)  are 
defined  in  VHDL.  The  primary  goal  of  our  work  is  to  automatically  generate  an  interface 
between  C++  program  and  VHDL  simulator  [5]  by  modifying  the  C++  program  as 
described  next. 

Modifications  of  the  C++  Program 

There  are  two  reasons  for  modifications  of  the  C++  program.  Since  we  replace  the 
target  microprocessor  with  the  PC,  the  first  modifications  require: 

•  replacing  functions  and  libraries  of  the  target  operation  system  (pSOS  [6])  with 
the  standard  C++  functions  and  libraries  (stdlib.h,  string.h,  time.h,  etc), 

•  replacing  functions  for  terminal  communication  with  standard  functions  for  read¬ 
ing  and  writing  from  the  standard  input/output  devices. 

The  second  modifications  are  due  to  the  communication  with  the  VHDL  simulator. 
These  modifications  include: 

•  including  functions  Read  Access  ()  and  WriteAccessQ  for  communication  with  the 
VHDL  simulator.  These  functions  read  data  from  the  output  file  or  write  data 
to  the  input  file  of  the  VHDL  simulator. 

•  replacing  program  lines  with  *Point _HW,  where  Point_HW  is  a  pointer  to  the 
location  in  hardware  devices,  with  the  functions  ReadAccess()  and  WriteAccess(). 

The  approach  is  illustrated  with  a  sample  C++  program  shown  in  Figure  2.a.  A  value  of 
OXFF  is  written  to  the  memory  location  0X3000  in  the  hardware  device  and  the  contents 
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#define  REGISTER  0X3000; 

#define  REGISTER  0X3000; 

mainQ 

mainQ 

int  *Point_HW,  a; 

int  a; 

Point  JHW  =  REGISTER; 

♦Point  JHW  =  OXFF; 

WriteAccess(REGISTER,  0XPF); 

a  =  *Point_HW; 

a  =  Read  Access  (REGISTER); 

a.) 

b.) 

Figure  2:  Sample  C++  program. 

of  this  location  is  then  assigned  to  the  variable  a.  The  modified  code  is  illustrated  in 
Figure  2.b.  Modifications  of  the  code  are  done  automatically  in  the  preprocessing  step. 
In  the  next  subsection,  the  communication  between  the  C++  program  and  the  VHDL 
simulator  is  described. 

Communication  Between  the  C++  Program  and  VHDL  Simulator 

When  in  the  C++  program  an  access  to  the  location  in  hardware  devices  is  detected, 
data  for  the  VHDL  simulation  are  automatically  written  in  the  command  file  and  the 
simulator  is  invoked.  Simulator  may  return  the  processed  data  in  the  output  file,  C++ 
program  reads  the  obtained  results  and  continues  with  the  execution.  It  must  be  noted 
that  the  C++  program  communicates  with  the  VHDL  simulator  only  if  the  access  to 
the  registers  in  hardware  is  detected.  Accesses  to  the  other  memory  addresses  do  not 
affect  the  flow  of  the  C++  program. 


Figure  3:  Communication  between  C++  program  and  VHDL  simulator. 

However,  the  co-simulation  can  be  accelerated  by  decreasing  the  number  of  communi¬ 
cations  between  the  C++  program  and  the  simulator.  The  sequence  of  write  accesses 
is  performed  then  with  a  single  run  of  the  simulator  once  a  read  access  is  detected. 
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III.  Performance  Estimation  of  the  System 

One  of  the  main  issues  of  the  HW/SW  co-design  is  to  optimize  the  design  considering 
either  timing  performances  or  cost  of  the  system  or  power  consumption  etc.  Here,  we 
will  describe  the  method  for  estimating  timing  performances  of  the  system.  Since  the 
timing  performances  of  the  hardware  devices  can  be  accurately  determined  with  the 
VHDL  simulator,  the  basic  idea  is  to  estimate  the  execution  time  of  the  C++  program 
on  the  PC. 

Timing  performances  of  the  software  are  estimated  with  the  C++  function  clock  which 
returns  the  number  of  ticks  corresponding  to  the  processor  execution  time.  Since  func¬ 
tions  WriteAccess()  and  ReadAccess()  are  due  to  the  communication  between  C++ 
program  and  VHDL  simulator,  they  are  both  excluded  from  this  calculation.  The 
approach  is  illustrated  in  Figure  4. 


Original  program  time  runs  only  inside  these  functions 


Processor  time 


Figure  4:  Estimating  the  timing  performance  of  the  system. 

IV.  Conclusions 

In  this  paper,  an  open  environment  for  HW/SW  co-simulation  is  presented.  The  pre¬ 
sented  environment  is  opened  in  terms  of  the  target  system  and  different  programming 

tools  for  simulation.  The  future  work  will  be  based  on  accelerating  the  HW/SW  co- 

simulation. 
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Abstract.  We  discuss  an  simple  approach  of  voltage-to-current  conversion  intended 
to  provide  a  front-end  interface  for  current-mode  processing  systems.  The  circuit  principle 
is  based  on  a  high-resistive  CMOS  inverter  followed  by  current-mirrors.  Over  the  entire 
rail-to-rail  input  range  an  output  linearity  error  less  than  4.37%  and  THD  of  2.253%  are 
achieved.  The  six-transistor  circuit  does  not  require  any  bias  voltage  and  operates  for 
power  supply  down  to  3.3V,  with  a  PSRR  of  30dB  and  1.6mW  power  dissipation. 


I.  Introduction 


One  of  the  main  drawbacks  of  existing  MOS  technologies  are  the  low- value  (2.5kfi/D),  but 
inaccurate  resistors  (20%  mismatching).  However,  many  IC  applications  require  linear 
transconductors  (voltage-to-current  converters),  e.g.  for  implementing  continuous-time 
active  filters  [7,  8],  and  interfacing  the  current-mode  on-chip  processing  [9],  By  using 
the  operational  transconductance  amplifier  (OTA)  the  input  voltage  range  is  limited  to 
a  fraction  of  the  supply  range  [1,  7,  8],  otherwise,  extensive  circuit  techniques  are  used 
to  obtain  a  true  input  rail-to-rail  operation  [2,  3].  We  propose  a  CMOS  transconductor 
circuit  which  consists  of  six  transistors  and  operates  in  a  rail-to-rail  voltage  input  range. 
The  high-ohmic  input  is  suitable  for  decoupling  the  input  node  from  the  previous  output, 
as  used  for  current  generation  of  a  sensitive  voltage  node,  where  the  value  was  stored 
on  a  capacitor  [4].  Providing  a  voltage-current  interface,  another  application  of  this 
circuit  is  the  analog  high-level  synthesis  [5],  on  the  level  transition  from  information  flow 
graph  (i.e.  without  signal  denotation  values)  to  a  signal  flow  graph,  called  as  electrical 
conditioning  synthesis  [5].  « 


0.  1.  2.  3.  4.  5. 

D  Input  Voltage  tvi  + 


Figure  1:  Single-input  CMOS  V-I  converter  and  its  DC-transfer  curve,  including  the 
absolute  linearity  error  referred  to  a  linear  regression  (L\^=30fim). 
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II.  Circuit  description  and  analysis 

The  discussed  circuit  is  a  modification  of  the  voltage-to-current  converter  introduced  by 
Wang  [10].  Wang’s  proposal  of  a  CMOS  transconductor  included  the  circuit  consisting  of 
transistors  M3-M6  from  Fig.  1-left.  We  extended  this  circuit  by  an  inverter  M1-M2  at  its 
input.  Using  long-channel  transistors  Ml,  M2  (L>20/nn)  the  inverter  stage  becomes  high- 
-resistively  compared  to  the  succeeding  minimally  sized  transistors  in  the  current- mirrors, 
thus  the  transfer  curve  of  the  inverter  is  linearized,  as  shown  in  Fig.2-left.  The  inverter 
stage  can  be  considered  operating  as  a  high-impedance  current  source,  providing  input 
current  to  the  mirror  M3-M6.  The  current  magnitude  and  flow  direction  are  determined 
by  the  operating  regions  of  the  inverter  transistors,  these  again  depend  on  the  input  volta¬ 
ge  Vin  (see  Fig. 2-middle).  The  DC-voltage  at  node  1  is  fixed  nearby  to  14*/2  potential  (if 
and  the  rail-to-rail  amplitude  of  input  signal  Vin  causes  only  a  small 
deflection  in4  a  range  of  hundred  mV  around  this  potential.  The  transistors  M3  through 
M6  operate  in  saturation,  whereas  Ml  and  M2  change  from  turned-off  through  saturation 
to  linear  region,  and  vice  versa.  Using  the  current-mirror  M3-M6,  an  output  impedance 
transformation  from  high-  to  a  low-resistive  circuitry  (gDS 5  +  Qdsb  »  9ds\  +  9ds2)  on 
condition  of  retaining  the  current  magnitude  at  nodes  Out  and  1  is  achieved. 

PMOS  o 


VillpUt  (  V  )  transist.M2 

Figure  2:  Left:  Linearizing  the  inverter  characteristic  by  stretching  the  region  around  the 
midpoint  [2.5;2.5]  (principle  diagram  for  a  single  power  supply  ofVdd=5V). 

Middle:  Operating  regions  of  transistors  Ml,  M2  depending  on  the  input  voltage  Vin. 
Right:  Vi  vs.  Vin  and  corresponding  relative  linearity  error  referenced  to  Vi~ dynamic  range. 

A.  Circuit  analysis 

Since  the  transistors  in  the  inverter  stage  must  be  considered  as  changing  their  opera¬ 
ting  region  while  varying  14i  within  the  range  gnd-Vd d,  we  cannot  apply  the  small-signal 
transistor  models  to  calculate  the  transfer  characteristic.  By  drawing  the  current  balance 
equation  (1)  at  node  1  for  an  input  voltage  Vin=Vdd/ 2,  when  all  transistors  Ml  through 
M4  are  in  saturation,  we  can  ascertain  the  midpoint  potential,  the  nearby  Vddf  2  operating 
point  in  the  circuit. 


Idi  +  Idz  —  1-D2  +  J.D4 


(1) 
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^~~-(VSGi  -  Vthp)2  •  (1  +  AiV^di)  +  o~~r~(^g3  ”  Vthp)2  *  (1  +  A3V5D3)  = 

Z  Ly  Z  L/3 

=  ”7T“"p— (V(552  —  VthN )2  *  (1  +  A2VDS2)  +  — —  (Vg54  ~  Vtfc/v)2  •  (l  +  A4VDS4)  (2) 

Neglecting  the  channel-length  modulation  effect  A  in  (2)  the  DC-voltage  Vi  results  from 
solving  an  quadratic  equation.  For  a  2Apm  CMOS  technology  with  symmetrical  thresh¬ 
olds  VthN~ythp~ 0.9V,  and  transconductances  KP}n  of  57  and  17//A/V2  for  NMOS  and 
PMOS,  respectively,  we  can  calculate  a  DC-voltage  of  Vi =2. 5 13V,  whereas  Spectre  simu¬ 
lator  yields  an  operating  point  of  2.516V.  When  V\  «  Vw/2  only  a  quiescent  current  flows 
through  the  current-mirror,  thus  /out=0,  except  a  small  offset  due  to  not  exactly  /?3=/?4. 
Applying  equation  (2)  for  a  general  input  voltage  Vin  yields: 

(3) 

This  can  be  converted  into  a  linear  equation  of  a  type  ax  +  b  =  cy  +  d,  which  results  in: 

_^2  (ft, 2  +  M (y&  +  vtlp  -  Kn  -  2 v«v*p) 

1  ft, 4  in  2 p3A-(vM-VthP-VthN) 

However,  this  linear  dependence  is  only  valid  if  both  transistors,  Ml  and  M2  stay  in 
saturation.  When  Q^-\-VthN)<Vin<{^f—Vthp)  equat.  (2)  changes,  e.g.  for  VJn=0V,  into: 


Wi  , 

Kvj±(vM-Virl-VthP- 


KnW4 

2  U 


(vi-W): 

(5) 


Hence  the  solution  for  Vi  becomes  nontrivial,  consisting  of  a  linear  term  of  Vin  plus  an 
additional  square-root  term  proportional  to  V^2 .  Accordingly  to  Fig.2-right,  the  mismatch 
between  the  linear  dependence  of  (4),  and  equation  (5)  amounts  less  than  5.214%  on  the 
margin  of  the  input  range. 


Transistor  size 

L\2  =  20// 777. 

6 

o 

CO 

11 

<N 

Lit  2  —  60  (im 

circuit  analysis  for  Vy 

±233mV 

±171mF 

±91  mV 

Spectre  simulation 

-  233 mV 
+  230 mV 

—  l56mK 
+  154mV 

—  78  mV 
+  76mV 

rel.  linear,  error  of  vy 

5.214% 

4.335% 

3.58% 

circuit  analysis  for  iout 

±48.39/tA 

±31.68M 

±15.565 pA 

Spectre  simulation 

—  49.32/zA 
+  51.55  {lA 

-  32.83//A 
+  34.53/2.4 

— 16.31  fiA 
+ 17.37  ixA 

rel.  linear,  error  of  iout 

4.37% 

3.42% 

2.63% 

Table  1:  Comparison  of  circuit  analysis  and  simulation  results  for  vy  and  iout  signals 

Due  to  a  fixed  operating  point  at  node  1  the  transfer  characteristic  of  the  current-mirror 
can  be  determined  applying  small-signal  models.  Regarding  the  voltage  vy  as  its  input 
signal,  the  output  current  is  given  by 
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■  _  (ffm5  +  ffme)  .  (6) 

^  (5^55  +  9Dse)Rx  +  1 

with  the  transistor  parameters  calculated  in  saturation,  by  gm—Kp^(VGs  —  Vth)0-+^VDs) 
and  ods=KpPVgs-v^-,  with  both  Vd*  and  Vqs  ~  2.5F.  For  ideal  performance  the 
output  operating  point  at  node  Out  also  has  to  be  set  near  Vdd/2.  Equation  (6)  is  valid 
only  as  long  as  M5,M6  remain  in  saturation,  i.e.  VGS-Vth=l'6V  <  VDS  -  |-f  -  IoutRxl 
This  condition  limits  the  maximal  impedance  value  Rx  of  the  succeeding  stage. 

The  AC-behaviour  of  the  transconductor  is  determined  by  the  pole  at  the  output  node 


of  the  circuit,  given  by 


up  1 


9ds 

00ut 


(7) 


whereas  in  expression  9ds=(9bss+9ds6+j£:)  and  Gout—  (£<^5,6+065, 6+0*)  the  load  Rx  ||  Cx 
is  dominating.  For  current-mode  applications,  e.g.  a  following  current  comparator  or  con¬ 
veyor,  the  bandwidth  can  be  increased  up  to  hundred-MHz  range,  due  to  decreased  capa¬ 
citance  Cout  and  small  Rx.  Since  the  AC-analysis  uses  small-signal  transistor  parameters, 
a  certain  statement  concerning  the  frequency  behaviour  of  a  rail-to-rail  transconductor 
can  be  performed  only  by  harmonic  distortion  obtained  from  a  transient  analysis.  The 
Spectre  simulation  results  have  shown  a  THD  less  than  5%  up  to  1 8.1MHz. 


PSRR  of  V-l— converter 
for  three  different  transistor  lengths  in  the  inverter: 


THD  of  V-l-converter  @  100kHz,  Peak-Peak  Input  voltoge=5V 

a 


Figure  3:  Power  supply  rejection  ratio  (PSRR)  and  Total  harmonic  distortion  (THD) 
depending  on  transistor  width  in  current-minors,  for  3  different  inverter,  both  at  100kHz. 


B.  Design  considerations 

Although,  the  circuit  analysis  has  been  exercised  for  minimal-sized  transistors,  except 
the  long-channel  inverter,  transistor  dimensions  of  >5 pm  for  good  mirror-matching  are 
recommended.  Fig.5  shows  the  mask  layout  of  the  proposed  voltage-to-current  converter 
with  60/wn-long  transistor  in  the  inverter  stage  and  cascode  current-mirror.  The  chip  area 
is  small  (180/^m  x  110 pm)  compared  to  some  other  OTA  implementations  [2].  Another 
advantage  of  this  circuit  is,  that  it  does  not  require  any  bias  voltage.  Using  this  advan¬ 
tage,  the  circuit  suffers  of  the  drawback  of  increased  power  consumption.  Caused  by  a 
quiescent  current  of  225pA/U3fiA  through  the  current-mirrors,  and  up  to  60pA/40pA  in 
the  inverter  stage,  the  total  power  consumption  at  maximum  PSRR  of  30dB,  is  approx¬ 
imately  2.5mW/l.6mW  for  an  20nm-/30pm-\ong  inverter,  respectively.  Fig.3-left  shows 
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that  the  maximally  achievable  PSRR  is  nearby  30dB,  when  the  current-mirror  is  proper¬ 
ly  dimensioned.  The  PSRR  is  symmetrically  regarding  to  positive  and  negative  power 
supply  rails.  The  circuit  operates  for  single  supply  Vdd  of  5V,  as  well  as  3.3V  without 
any  reduction  in  PSRR.  Running  with  a  3.3V  voltage  supply,  the  transconductor  can  also 
provide  an  simple  input  interface  to  systems  with  higher  power  supply.  For  the  reason 
of  current  symmetry  with  regard  to  V5n-range,  the  ^nda-potential  (Vdd/ 2)  ha s  to  be  set 
equal  for  both,  the  lower  (3.3V)  and  the  higher  (5V)  powered  systems.  Another  approach 
for  a  3.3V  supply  system  to  achieve  symmetrical  current  range  opposite  to  5V-system,  is 
moving  the  transconductor  characteristic  accordingly  to  Fig.4-left. 

As  a  simple  approach  for  designing  the  proper  V-I  converter  the  following  scheme  can  be 
considered: 

•  estimate  L5)6  by  the  required  output  Qds  for  a  given  of  the  following  stage 

•  adjust  the  from  equation  (6)  for  the  desired  output  current  range 

•  optimize  transistors  in  the  current-mirror  for  the  lowest  THD  and  the  largest  PSRR 

•  from  equation  (4)  ascertain  the  needed  inverter  length  2,  accordingly  to  V{n 

•  check  the  required  performance  and  redesign  once  more. 


Vinpul  <Jc  tVJ 

Figure  4:  Left:  Movable  DC-transfer  curve  of  the  V-I  converter  of  Fig.  1. 
Right:  Schematic  of  a  trans conductor  with  cross-coupled  mirrors. 


III.  Circuit  modification 

By  changing  the  dimensions  of  the  current-mirror  transistors  different  output  current  off¬ 
sets  are  adjusted,  see  equation  (4).  This  way  the  transfer  curve  can  be  moved  to  achieve 
either  positive  or  negative  currents  as  shown  in  Fig. 4-left.  Applying  the  additional  cross- 
-coupled  current-mirrors  an  180°-phase  shifting,  and  thus  inverted  transconductance  curves 
in  compare  to  Fig.4-left  can  be  attained.  Making  the  circuit  more  insensitive  to  output 
voltage  deviation  caused  by  the  voltage  drop  over  the  cascode  mirror  version  of  Fig.5 
can  be  employed.  The  output  impedance  increases  by  of  the  additionally  inserted 
transistors,  however,  also  larger  channel  length  for  Ml, M2  is  required  to  maintain  the  same 
linearity.  However,  the  THD  increases  up  to  4%,  and  the  PSRR  is  reduced  noticeable. 

IV.  Final  Remarks 

A  very  simple  circuit  approach  for  voltage-to-current  conversion  has  been  presented.  The 
linear  characteristic  is  a  result  of  stretching  the  CMOS  inverter  curve  with  high-ohmic 
transistors.  It  has  been  shown  that  the  circuit  can  be  optimized  for  best  linearity,  lowest 
harmonic  distortion,  and  largest  PSRR  by  appropriate  dimensions  of  the  two  types  of 
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Figure  5:  Cascode  version  of  the  voltage-to- current  converter  and  corresponding  layout  in 
a  2.4fim  CMOS  technology  (Linv=60fim). 

transistor  pairs  used  in  the  inverter  and  in  the  current-mirrors.  The  proposed  circuit 
attains,  even  for  a  true  rail-to-rail  operation  range  of  5V,  a  THD  of  2.253%  and  linearity 
error  of  3.42%  in  compare  to  those  reported  in  [2,  3]  The  linearity  and  THD  in  these 
references  are  given  only  for  approximately  60  percent  of  the  rail-to-rail  dynamic  range. 
For  comparison,  in  our  circuit  for  a  3V  peak-peak  input  (60  percent  of  the  rail-to-rail 
dynamic  range)  the  achieved  linearity  error  of  0.9%  is  better  than  in  [1,  2,  3,  8],  as  well 
as  the  THD  of  0.75%  is  similar  to  those  reported  in  [2,  3,  6,  8]. 
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Abstract.  In  this  work,  we  present  a  novel  low- voltage  low-power 
current  conveyor  designed  and  optimized  to  be  used  in  capacitive 
multiplier  topologies.  The  proposed  circuit  allows  to  obtain  very  high 
capacitive  gain  factors  (up  to  50000),  with  a  high  degree  of  accuracy. 

In  order  to  investigate  the  effectiveness  of  this  topology,  we  have 
implemented  a  low  cut-off  frequency  filter  with  a  standard  0.5p 
CMOS  technology.  The  supply  voltage  is  2  V  and  the  quiescent 
power  consumption  is  110  pW.  SPICE  simulations  confirm  that  the 
described  multiplication  technique  is  sufficiently  accurate  for 
capacitances  higher  than  2pF. 

I.  Introduction 

Research  activity  in  analog  circuit  design  has  gone  towards  the  realization  of  low-voltage 
low-power  circuits  [1,2].  As  more  and  more  complex  systems  are  being  integrated  on  chip,  area 
reduction  is  of  a  fundamental  importance.  It  is  well  known  how  a  limiting  problem  in  integrated 
circuits  is  the  realization  of  capacitance  values  higher  than  100  pF,  because  of  their  large 
occupied  area. 

In  this  paper,  we  intend  to  give  an  original  contribution  to  the  solution  of  the  realization 
of  on-chip  capacitance  values  higher  than  the  integratable  ones.  This  is  done  by  performing  a 
capacitance  multiplication,  so  high  equivalent  RC  products,  which  cannot  be  directly 
implemented  in  integrated  version,  have  been  realized.  With  respect  to  other  topologies 
proposed  by  the  same  authors  of  this  paper  [3,4],  this  circuit  contains  a  current  divider  which 
allows  to  save  power  consumption.  It  can  be  also  utilized  in  the  design  of  very  low  cut-off 
frequency  filters,  which  are  needed  in  a  wide  range  of  applications  as,  for  example,  in 
biopotential  amplifiers  and  in  the  control  of  slowly-varying  geological  variables  [5,6]. 

II .  The  proposed  topology 

The  proposed  topology  of  the  low  voltage  low  power  current  conveyor,  performing  the 
capacitance  multiplication,  is  shown  in  Fig.  1 . 


Fig.  1  CCII  block  scheme  for  capacitance  multiplication 


200 


From  the  circuit  shown  in  Fig.  1,  assuming  ideal  components,  a  straightforward  analysis 
shows  that 


z  -  — 

EQ  sKCs 


(0 


In  fig.2  the  simplified  schematic  of  the  CCII  is  shown.  A  low  voltage  op-amp 
performs  the  voltage  following  action  between  Y  and  X  with  good  accuracy,  while  the  output 
current  (high  impedance  node)  is  drawn  by  copying  the  current  of  the  output  stage  of  the  op- 
amp.  The  op-amp  has  a  complete  rail-to-rail  dynamic  range,  ensured  by  a  traditional  constant- 
gm  input  stage  working  in  weak-inversion  (one-to-one  current  mirror  switch),  followed  by  a 
typical  low-voltage  AB  biased  output  stage. 

The  impedance  on  the  input  node  X  is  kept  sufficiently  low  by  the  feedback  of  the  op- 
amp  (open-loop-gain  «  80  dB).  Unfortunately,  this  impedance  reduction  takes  place  as  long  as 
the  feedback  is  effective,  that  is  in  low  frequency  range,  so  this  technique  is  truly  appropriate 
for  not  very  high  frequency  operation. 


Fig.  3  Schematic  of  the  current  divider.  K=A-  (Ibr/fa) 
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In  the  circuit  depicted  in  fig. 3,  IA  and  Ib  represent,  respectively,  the  pull-down  and 
pull-up  currents  of  the  output  stage  of  the  current  conveyor.  The  principle  of  the  current 
division  is  based  on  the  translinear  loop,  composed  by  transistors  MN1-MN2  and  MN10- 
MN1 1  and  the  proper  operation  of  the  circuit  is  ensured  by  biasing  all  the  transistors  in  weak 
inversion,  where  the  drain  current  shows  an  exponential  dependence  on  gate-source  voltage, 
as  follows 


W 

Id^do-JV^.  (2) 

From  the  fig.3,  we  have  also 

V0S(MN1)  -  V0S(MN2)  =  V0S(MN1 1)  -  VOS(MN10)  (3) 

From  (2)  and  (3)  we  obtain 

ID(MN1)  _  IP(MNH) 

ID(MN2)  ~  ID(MN10)’  (4) 

from  which 


Id(MN1)-Id(MN2)  Id(MN11)-Id(MN10) 

Id(MN1)  +  Id(MN2)  Id(MN10)  +  Id(MN11)’  (5) 

Being  the  differential  current  of  MN10  and  MN11  mirrored  and  single-ended  converted  we 
obtain,  after  proper  replacements 


IoLIT  "a^Ia""Ib)"^Ia~Ib^ 


(6) 


where  A  is  a  further  division  factor,  performed  by  transistors  MN7  and  MP7,  which  has  been 
fixed  to  10.  Fig.4  shows  the  THD  of  the  current  divider  vs.  input  current  amplitude  for 
different  Ib2  values  and  A=10.  From  this  figure,  it  is  possible  to  deduce  that  the  body  effect  of 
transistors  MN1  and  MN2  does  not  affect  the  circuit  linearity  performance.  The  only 
consequence  is  a  light  increase  of  the  K  value. 

In  fig.  5,  the  module  of  the  equivalent  input  impedance,  ZEQ,  versus  frequency  is  plotted, 

for  a  20  pF  starting  capacitance  and  a  current  division  of  5- 1 04,  having  imposed  that  Ibi=10  pA, 
lb2-2  nA  and  A=10. 


Fig.4.  Current  divider  THD  vs. 
input  current  amplitude 


Fig.5.  Module  of  ZEQ  for  Cs=20pF 
and  a  current  division  of  5-104 
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Fig.6  shows  the  schematic  of  the  low-pass  filter,  while  in  fig.7  the  magnitude  response 
of  the  proposed  filter  is  reported.  The  resistance  R  has  been  fixed  to  a  159  KX2  value. 


CCII- 


IlN 


IlN 

* 


CURRENT 

DIVIDER 

K 


Iin/K 


Fig.6  Low  pass  filter  block  scheme 
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Fig.7.  Magnitude  response  of  the  proposed  filter. 


HI.  Final  remarks 

A  novel  solution  for  the  implementation  of  a  capacitive  multiplier  has  been  here 
presented.  The  proposed  circuit  shows  low  supply  voltage  low  power  characteristics  and  good 
multiplication  accuracy  in  a  sufficiently  high  bandwidth.  An  example  of  the  circuit 
application  in  the  design  of  a  low  cut-off  (1  Hz)  low  pass  filter  is  also  presented. 
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Abstract.  The  new  approach  of  single  neuron  and  neural  network  control 
problems  with  the  sliding  mode  application  is  considered  in  this  article.  Sliding 
mode  examination  provides  some  advantages  of  stability  etc.  The  region  of  the 
neuron  and  neural  network  stabile  operation  is  estimated  and  some  conclusions  are 
made  on  the  basis  of  this  research. 


I.  Introduction 

Information  and  neural  science  as  a  result  of  the  study  of  the  mechanism  and  structures  of 
the  brain  have  proposed  artificial  neural  networks  (ANN).  This  has  led  to  the  development  of 
new  computational  models,  based  on  this  biological  background,  for  solving  complex  prob¬ 
lems  like  pattern  recognition,  fast  information  processing,  learning  and  adaptation. 

Single  neuron  and  neural  network  control  is  connected  with  discontinuities.  These 
discontinuities  cause  the  stability  problems  on  the  surface.  As  it  is  shown  in  the  works  of 
V.Utkin  [1]  and  V.  Mkrttchian  [2]  in  discontinuous  control  the  sliding  mode  is  arising. 

The  sliding  mode  gives  the  following  advantages: 

•  The  trajectories  the  state  vector  belong  to  manifolds  of  lower  dimension  than  that  of  the 
whole  state  space,  therefore  the  order  of  differential  equations  describing  sliding  motions 
is  also  reduced. 

•  In  most  of  practical  systems  the  sliding  motion  is  control-independent  and  determined 
merely  by  the  properties  of  the  control  plant  and  the  position  (or  equations)  of  the  dis¬ 
continuity  surfaces. 

•  Under  certain  conditions  sliding  mode  may  become  invariant  to  variations  of  dynamic 
characteristics  of  the  control  plant  which  poses  a  central  problem  death  with  in  the  theory 
of  automatic  control. 

Even,  if  we  employ  continuous  control  algorithm  the  control  itself  is  shaped  as  a  high 
frequency  discontinuous  signal  whose  mean  value  is  equal  to  the  desired  continuous  control. 

II.  Dynamic  Neuron  and  Neural  Network 

The  network  architecture  is  defined  by  the  basic  processing  elements  and  the  way  in 
which  they  are  interconnected.  The  basic  processing  element  of  the  connectionist  architecture 
is  often  called  a  neuron  by  analogy  with  neurophysiology,  but  other  names  as  perception  or 
adaline  are  also  used.  The  basic  model  of  a  neuron  is  illustrated  in  [3,  4].  The  neuron  is 
composed  of  three  components:  a  weighted  summer;  a  linear  dynamical  single-input,  single¬ 
output  (SISO)  system;  a  non-dynamical,  non-linear  function,  which  is  also,  called  the  activa- 
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tion  function. 

The  weighted 


summer  is  described  by 


(i) 


giving  a  weighted  sum  Sj  in  terms  of  the  internal  inputs  Xj  ,  external  (control)  inputs  Uk  and 
corresponding  weights  wy  and  bik  together  with  constants  Zj  which  play  a  role  of  standard  bias, 
t  denotes  a  time  variable,  which  can  be  either  continuous  or  discrete. 

Equation  (1)  can  be  written  in  the  matrix  form  as 

SjCtJ^WiXCO  +  BjuCO  +  Zj.  (2) 

The  linear  dynamical  system  has  input  v,  and  output  yi  .  The  variable  y\  is  the  ith  neuron 
output.  Its  mathematical  model  can  be  written  for  continuous  systems  as 

Tiyi(t)+yi(t)  =  vi(t).  (3) 

The  discrete  time  model  can  be  represented  as 

Tj  y ,  (t  + 1) + (1  -  Tj  )y ,  (t)  =  v ,  (t) .  (4) 

The  non-dynamical  non-linear  function  $(•)  (activation  function)  gives  the  signal  Vj(t)  in 
terms  of  the  summer  output  Si(t):  v,  =  f,  (sf ) .  (3) 


III.  Sliding  Mode  Technique 

A  distinguished  feature  of  differential  equations  describing  any  control  system  is  known 
to  be  the  presence  of  a  scalar  of  vector  parameter  u  referred  to  as  control: 

x  =  f(x,t,u),  usR"1.  (6) 

In  early  regulators,  the  controls  have  mostly  been  of  relay  type.  As  a  result,  the  right-hand 
part  of  the  differential  equation  of  the  system  motion  proved  to  be  a  discontinuous  function  of 
the  system  state  vector.  For  systems  with  isolated  discontinuity  points,  some  analysis  and 
synthesis  methods  have  been  designed  based  on  the  classical  theory  of  differential  equations 
with  the  use  of  point-to-point  transformations  and  averaging  at  the  occurrence  of  high 
frequency  switching. 

However,  in  attempts  to  mathematically  describe  certain  application  problems  the  same 
case  as  in  the  Coulomb  friction  mechanical  system  were  often  faced  when  the  totality  of 
discontinuity  points  proved  to  be  a  nonzero  measure  set  in  time.  This  fact  is  easily  revealed 
for  a  sufficiently  general  class  of  discontinuous  controls  defined  by  the  relationships 


ufxt)  =  |U'(X,t)  WUh  Si(x)>°  (7) 

J  ]u:(x,t)  with  Si(x)<0 

where  uT=(m,  ...,  um)  and  all  functions  u,+(x,t)  and  uffct)  are  continuous.  The  state  vector  of 
such  systems  may  stay  on  one  of  the  discontinuity  surfaces  or  their  intersection  within  a  finite 
time.  For  example,  the  system  state  vector  trajectories  belong  to  some  discontinuity  surface 
Si(x)=0  if  in  the  vicinity  of  this  surface  the  velocity  vectors  f(x,t,u)  are  directed  toward  each 

other.  ■  - 

As  evidenced  by  these  examples,  the  motion  trajectories  which  belong  to  the  set  ot  dis¬ 
continuity  points  are  singular  since  in  any  combination  of  continuous  controls  Uj+(x,t)  and  w, 
(x,t)  they  differ  from  the  system  trajectories.  An  accepted  term  for  the  motion  on  discontinuity 
surfaces  is  sliding  mode .  Incidentally,  the  motion  along  the  segment  |x|^Pq  /k  in  the 
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Coulomb  friction  mechanical  system  is  also  a  sliding  mode  motion. 

Here  it  will  be  apt  to  note  that  a  sliding  mode  does  exist  on  a  discontinuity  surface! 
whenever  the  distances  to  this  surface  and  the  velocity  of  its  change  S  are  of  opposite  signs, 
i.e.  when  lim  s  >  0  and  lim  s  <  0  (8) 

s— >“0  s->-0 

The  mathematical  description  of  sliding  modes  is  quite  a  challenge.  It  requires  the  design 
of  special  techniques.  The  solution  of  x=rf(x,t)  equation  is  known  to  exist  and  be  unique  if  a 
Lipshitz  constant  L  may  be  found  such  that  for  any  two  vectors  xi  and  x2 

I|f  (X1  >  t)  -  f  (x  2  ,  t)||^L||x  I  x  2 1  (9) 

It  is  evident  that  hi  the  dynamic  system  with  the  discontinuous  control,  condition  (9)  is 
violated  in  the  vicinity  of  discontinuity  surfaces.  Indeed,  if  points  x,  and  x2  are  on  different 

sides  of  the  discontinuity  surface  and  ||x,  -x2||  -»  0,  inequality  (9)  is  not  true  for  any  fixed 

value  of  L.  Therefore,  at  least  formally,  some  additional  effort  is  needed  to  find  a  solution  to 
systems  at  an  occurrence  of  a  sliding  mode.  Moreover,  Iota  function  x(t)  pretend  to  be  a 
solution  lying  on  the  set  of  discontinuity  points.  Even  in  this  case  the  way  this  function  may 
turn  Eq.9  into  identity  is  not  clear  since  control  is  not  defined  on  the  surface  stfx)  -  0. 

Let  us  give  a  closerlook  to  the  physical  approach  to  obtaining  the  sliding  mode  motion 
equations.  Uncertainty  in  the  system  behaviour  on  a  discontinuity  surface  has  appeared  as  a 
result  of  the  imperfection  of  a  model  of  the  type  [1]  which  was  supposed  to  idealise  the  real 
life  system.  This  model  fails  to  recognise  such  factors  as  imperfections  of  the  switching  de¬ 
vice  (time  delay,  dead  zones,  hysteresis  loops,  infertility  of  elements,  etc.).  Besides,  the  equa¬ 
tions  of  an  actual  control  plant  may  be  of  an  order  higher  than  those  of  a  model.  And,  finally, 
the  instruments  used  to  obtain  the  information  on  the  state  vector,  which  is  necessary  to  rea¬ 
lise  controls  [2]  may  also  prove  inertial.  Recognition  of  all  these  factors  makes  all  disconti¬ 
nuity  points  isolated  thus  removing  mathematical  (but  not  analytical)  difficulties  in  describing 
the  system  behaviour.  The  physical  approach  implies  introduction  of  such  imperfections, 
which  subsequently  tend  to  zero.  A  result  obtained  in  such  limit  transitions  was  chosen  as  an 
appropriate  mathematical  description  of  a  sliding  mode. 

The  system  considered  was  a  second  order  relay  control  system  whose  discontinuity  sur¬ 
face  was  actually  a  straight  line  on  the  plane  of  co-ordinates  of  error  x  and  its  derivatives  X : 

cx  +  x  =  0,  c- const  (10) 

The  behaviour  of  the  system  was  studied  under  the  assumption  that  a  time  delay  was  in¬ 
herent  in  the  switching  device  and,  consequently,  the  discontinuity  points  were  isolated.  It  was 
found  that  irrespective  of  the  control  plant  parameters  and  disturbances  affecting  it,  the 
solution  of  the  second  order  equation  in  sliding  mode  always  tends  to  the  solution  of  the  first 
order  linear  differential  Eq.10  which  depends  only  on  the  angle  factor  c  of  the  switching 
straight  line. 

In  all  of  the  above  cases,  the  sliding  equations  are  not  postulated,  their  validity  being 
proved  with  the  help  of  limiting  procedures.  However,  a  special  form  of  the  switching  device 
model  restricts  the  scope  of  applications  of  these  results.  Therefore  the  question  of  their  appli¬ 
cability  to  models  of  other  types  remains  unanswered  (for  intense,  piecewise  linear  approxi¬ 
mation  of  the  discontinuous  characteristic).  Besides,  the  limiting  procedures  treated  in  the 
above  papers  have  been  designed  exclusively  around  the  point-to-point  transformation  tech¬ 
nique,  too  analytically  difficult  to  be  applied  to  the  study  of  any  systems  but  those  with  a 
single  discontinuity  surface. 
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Assume  that  a  sliding  mode  exists  on  manifold  s(x)=0,  sr(x)=[sj(x),  ...,  sm(x)J  .  Let  us 
find  a  continuous  control  such  that  under  the  initial  position  of  the  state  vector  on  this 
manifold,  it  yields  identical  equality  to  zero  of  the  time  derivative  of  vector  s(x)  along  system 

[1]  trajectories:  s  =  Gf(x,t,u)  =  0,  (M) 

where  the  rows  of  the  (m  x  n)  matrix  G—{ds/dxj  are  the  gradients  of  the  functions  sfx). 

Assume  that  a  solution  (or  a  number  of  solutions)  of  the  system  of  algebraic  Eq.  (1 1)  with 
respect  to  m-dimensional  control  does  (or  do)  exist.  Use  this  solution,  hereinafter  referred  to 
as  equivalent  control  ueq(x,t),  in  system  (12)  in  place  in  u: 

x  =  f[x,t,ueq(x,t)J.  (12) 

It  is  quite  obvious  that,  by  virtue  of  condition  (1 1),  a  motion  starting  in  s[x(to)]  =  0  will 
proceed  along  the  trajectories  which  lie  on  the  manifold  s(x)  =  0. 

The  above  procedure  will  be  called  the  equivalent  control  method  and  Eq.  12  obtained  as 
a  result  of  applying  this  method  will  be  regarded  as  the  sliding  mode  equation  describing  the 
motion  on  the  intersection  of  discontinuity  surfaces  s;(x)  -  0,  i  =  1, ...  m. 

From  the  geometric  viewpoint,  the  equivalent  control  method  implies  a  replacement  of 
the  undefined  discontinued  control  on  the  discontinuity  boundary  with  a  continuous  control 
which  directs  the  velocity  vector  in  the  system  state  space  along  the  discontinuity  surfaces 
intersection. 

For  example,  in  order  to  find  this  vector  in  a  system  with  a  single  discontinuity  surface 
s(x)  =  0  at  some  point  (x,  t)  [1]  one  should  vary  the  scalar  control  from  u  to  u  ,  plot  the  locus 
of  f(x,t,u)  and  find  the  point  where  it  intersects  the  tangential  plane.  The  point  of  intersection 
determines  the  equivalent  control  ueq(x,t)  and  the  right-hand  part  f(x,t,ueq)  of  the  sliding  mode 
differential  Eq.12. 

IV.  Conclusion 

1.  Artificial  Neural  Networks  (ANN)  is  a  new  method  of  developing  high-efficiency 
computational  models. 

2.  Discontinuities  of  ANN  control  circuits  can  be  eliminated  using  the  sliding  mode 
characterized  by  reduced  order  of  mathematical  support,  automatic  adjustment  through 
feedback  from  discontinuities,  interference-free  operation. 

A  basic  element  of  ANN  architecture  is  a  neuron  model  structure  described  by 
substantiated  formulae  containing  continuous  and  discrete  variables  and  a  scalar  parameter 
formulated  as  a  characteristic  of  any  control  system. 
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Abstract.  Synaptic  weights  of  the  model  neurons  in  our  selforganizing 
artificial  neural  network  (ANN)  modify  according  to  the  Bienenstock,  Cooper 
and  Munro  (BCM)  unsupervised  learning  algorithm.  What  makes  the  BCM 
rules  different  from  any  other  learning  procedure  is  the  dynamic  synaptic  mod¬ 
ification  threshold  9m,  which  determines  whether  a  neuron’s  activity  at  any 
moment  will  lead  to  strengthening  or  weakening  of  its  synaptic  weights.  We  de¬ 
sign,  perform  and  discuss  several  pilot  experiments  with  the  2D  alphanumeric 
pattern  classification  in  order  to  investigate  the  computational  properties  of 
the  BCM  neural  network  with  the  so  called  feedforward  inhibition. 

1  Introduction 

The  BCM  theory  was  introduced  in  order  to  explain  selforganization  in  the  developing 
visual  cortex  [1].  Later  it  was  used  for  explanation  of  the  experience-dependent  plasticity 
in  the  mature  somatosensory  cortex  [2],  It  was  shown  theoretically  [3]  and  in  simulations 
[4],  that  the  BCM  neural  network  with  the  so  called  feedforward  inhibition  can  perform 
projection  pursuit,  i.e,  it  can  find  projections  in  which  the  departures  from  normality  in 
the  statistical  distribution  of  input  data  are  the  most  prominent.  This  is  an  important 
property  implying  the  feature  detecting  abilities  of  this  neural  model.  Thus,  the  BCM 
neural  network  was  successfully  applied  to  the  pattern  recognition  tasks  [5-7].  Since, 
to  our  knowledge,  this  model  has  not  been  yet  applied  to  the  2D  alphanumeric  pattern 
classification,  we  decided  to  explore  this  field  of  its  application.  We  constructed  our 
own  database  of  letters  and  performed  several  experiments  exploring  the  architecture  and 
training  parameters  of  the  BCM  neural  network. 

2  Description  of  the  BCM  Neural  Network 

We  now  define  a  network  of  n  neurons  with  feedforward  inhibition  which  we  use  in  our 
computer  simulations  of  pattern  classification.  The  input  vector  x  is  fed  into  a  one  layer 
ANN  in  which  each  neuron  is  inhibited  proportionally  to  activities  of  other  neurons.  The 
inhibited  activity  of  the  kth  neuron  reads  [3] 

ck=o{ck- vJ2ci)  (1) 

m 

where  c*  =  z-m*,  mk  is  the  synaptic  weight  vector,  and  a  is  the  sigmoidal  nonlinearity.  It 
was  shown  that  the  BCM  feedforward  inhibition  network  is  the  first-order  approximation 
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Figure  1:  The  modification  function  <j)  for  two  diferent  values  of  9  m 

of  the  BCM  lateral  inhibition  network  [3].  The  BCM  learning  rule  is  found  through  the 
gradient  minimization  of  the  objective  function  R  =  Rk,  where  [3] 

Rk  =  -v{^m-\E2[%)}  (2) 

where  E  means  the  expected  value.  After  writing  down  the  corresponding  derivatives  we 
arrive  at  the  learning  rule  [3] 

mk  =  =  rj{E[(j)(ck,  Olf)^  (dk)x]  -  nYL  &AfW(pj)x]}  (3) 

dmk  1  j^k 

where  the  modificationn  threshold  $m  =  is  the  point  where  the  modification  function 
(P  =  ck(ck  -  Om)  changes  sign  from  minus  to  plus  (Fig.  1).  We  calculate  9m  as  a  moving 
average  of  the  past  squared  activity  of  a  neuron,  such  that  [2] 

eM(i)  =  (c2(t)>r  =  -  [  C2(t')  d£  (4) 

7”  J 
— oo 

The  parameter  r  determines  the  length  of  the  recent  past  over  which  the  squared  neuron  s 
response  is  averaged.  We  call  it  a  neuron’s  memory  of  its  past  activity.  The  smaller 
(larger)  r,  the  shorter  (longer)  memory.  The  whole  notion  of  the  dynamic  modification 
threshold  0M  was  biologically  inspired  [1].  From  the  above  relations  it  follows  that  when 
0  <  ck  <  9m  all  active  synaptic  weights  weaken.  When  ck  >  Om,  all  active  synaptic 
weights  potentiate. 

3  Results 

The  network  input  x  was  the  binary  {1, 0}  bitmap  with  10  x  13  pixels.  During  training 
and  testing,  noisy  representatives  of  m  =  8  artificially  created  letters  (classes)  were  pre¬ 
sented  in  random  order.  The  noise  was  introduced  by  randomly  switching  5%  of  bits.  We 
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Figure  2:  Responses  of  one  randomly  chosen  neuron  out  of  n  =  8  after  it  became  selective 
to  one  pattern  (letter)  out  of  m  =  8.  Each  point  represents  a  neuron’s  response  to  one 
noisy  test  pattern.  There  were  100  test  patterns  for  each  letter.  We  can  see  that  the 
neuron  became  selective  for  one  letter. 

chose  these  values  of  parameters:  a(z)  £  (—20,20)  whit  the  slope  equal  to  1,  77  =  10"4, 
and  r  =  200  iterations.  Training  was  stoppped  after  1.5  x  106  iterations,  while  conver¬ 
gence  happened  roughly  after  0.8  x  106  iterations  as  judged  according  to  the  stabilization 
of  the  total  sum  of  individual  weights  of  all  neurons.  In  our  simulations,  we  investigated 
the  effect  of  different  number  of  neurons  n  and  different  strengths  of  inhibition  /i.  For  all 
examined  n  we  arrived  at  the  optimal  value  of  fi  =  1/n. 

With  n  =  8  =  m  neurons  and  no  inhibition,  the  ANN  did  not  classify  at  all.  With 
n  =  8  neurons  and  (j,  =  1/n,  after  8  trainings  out  of  10,  each  neuron  became  selective  to 
one  letter.  The  ANN  classified  all  test  patterns  100%  correctly  (because  the  noise  was 
very  low)  (Fig.  2).  After  remaing  2  trainings,  some  2  neurons  became  selective  to  the 
same  letter,  thus  one  class  remained  unrepresented. 

With  n  =  4  =  m/2  neurons  and  (jl  =  1/n,  after  all  10  trainings,  each  neuron  became 
selective  to  one  letter  at  random.  Thus,  since  there  is  8  classes  (letters),  the  ANN  classified 
correctly  only  1/2  of  the  test  patterns.  This  result  is  not  surprising  since  one  BCM  neuron 
can  find  only  one  projection  for  the  input  data  which  cannot  be  the  same  for  two  different 
classes. 

With  n  —  16  =  2m  neurons  and  fj,  =  1/n,  after  8  trainings  out  of  10,  pairs  of  neurons 
became  selective  to  one  letter.  This  means  that  2  neurons  would  give  the  responses  as  in 
Fig.  2  to  the  same  class.  After  remaining  2  trainings,  some  3  neurons  became  selective 
to  the  same  letter,  but  it  never  happened  that  some  letter  remained  unclassified. 
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4  Conclusion 

We  have  investigated  the  BCM  neural  network  with  feedforward  inhibition  in  computer 
experiments  with  classification  of  letters.  When  the  number  of  neurons  was  the  same  as 
the  number  of  classes,  i.e.  n  =  m,  each  neuron  became  selective  to  a  different  letter. 
When  n  =  2m,  and  the  number  of  training  patterns  was  the  same  for  each  letter,  we 
found  a  typical  but  still  interesting  selforganizing  tendency  towards  a  uniform  and  even 
representation  of  the  input  data.  These  results  suggest  that  the  BCM  ANN  can  have 
similar  properties  as  for  instance  the  Kohonen  ANN  [8].  This  requires  more  investigation. 
It  is  also  desirable  to  experiment  with  a  more  sophisticated  database  of  scanned  hand 
written  characters  and  to  compare  the  performance  with,  for  instance,  the  standard  back- 
propagation  ANN.  At  last,  it  could  be  interesting  to  program  and  run  the  BCM  classifier 
on  a  transputer  system  [9]. 
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Abstract.  In  this  paper,  we  briefly  analyze  methods  used  at  implementation 
of  complex  information  systems  in  real  applications  and  tiy  to  point  out  similarity 
to  the  methods  used  for  a  functional  specification  and  design  of  complex 
microelectronics  systems.  Methods  currently  used  in  both  cases  are  empirical. 
These  lead  to  acceptable  but  not  optimal  results.  We  would  like  to  show  in  this 
paper  the  possibilities  to  improve  the  microelectronic  system  functional 
specification,  especially  from  optimal  functional  coverage  of  the  application  field 
point  of  view  and  with  regard  to  its  interaction  to  the  application  environment. 


I.  Introduction 

In  recent  time,  we  may  see  rapid  development  of  information  systems  and  their 
implementation  into  nearly  all  human  environments.  Their  use  in  industry  helps  to  save  cost 
as  well  as  improve  quality  of  production  process  and  decrease  dependency  of  production 
process  on  a  human  factor.  A  good  information  system  is  well  integrated  complex  of 
hardware  and  software  parts  implemented  into  the  specific  production-administrative 
environment.  A  conformity  with  environment  and  adaptability  to  specific  demands  are  key 
factors  to  success  of  such  a  system.  Methods  used  in  implementation  process  of  information 
systems  into  real  environments  might  be  a  good  example  for  definition  of  general  aspects  of 
the  functional  specification  and  design  of  complex  microelectronics  systems.  We  briefly 
analyze  the  implementation  aspects  of  information  systems  into  an  extensive  production 
environment  with  pointing  to  the  vertical  and  horizontal  application  structure.  The  important 
aspect  is  the  optimal  definition  of  data  flow  and  its  transformation  into  clear  and  easily  used 
structure.  Another  important  aspect  is  the  resistance  of  the  data  structure  to  any  kind  of 
distortions  and  to  errors  in  hardware  structure  of  the  system. 


II.  Information  system  and  real  environment 

Positioning  of  an  information  system  (IS)  in  real  environment  (production  company)  is 
shown  on  Fig.  1.  The  area  1  covers  the  whole  application  environment  from  which  the  area  2 
shows  the  functional  part  covered  by  the  IS  and  the  area  3  is  the  part  functionally  not  covered 
by  the  IS.  Vertical  lines  4  show  a  hierarchical  structure  of  the  environment  (organizational 
structure,  internal  processing  and  management  decision  rules).  The  data  flow  through  the  IS 
is  shown  by  the  trajectory  5  .  The  hierarchical  environment  structure  with  its  own 
characteristics  means  a  kind  of  a  barrier  6  to  the  data  flow.  Another  barrier  7  is  a  border 
between  IS  and  the  functionally  not  covered  area.  Similar  influence  (8)  means  to  the  data 
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flow  an  interface  to  environment  9  around  the  implementation  environment  1.  Non  linear 
borders  between  different  areas  show  incompatibilities,  these  may  lead  to  the  data  flow 
distortion  (fluentness  or  contents)  and  in  worst  case  even  to  data  loss.  Overall  success  of  the 
IS  implementation  does  not  depend  just  on  quality  of  IS  or  characteristics  of  non-covered 
functional  part  alone.  Even  with  the  best  quality  IS  available  the  overall  functionality  might 
be  insufficient  due  to  an  improper  interface  between  IS  and  environment  or  an  improper 
hierarchical  structure  of  the  environment.  Improvement  of  the  interface  may  be  done  in 
narrowing  and  smoothing  borders: 

•  From  the  IS  side  (2)  -  by  parametrizing  (i.e.  functional  change  made  by  change  of  set  of 
standard  parameters)  or  by  the  modification  of  the  standard  functionality  (algorithm 
modifications  and  data  conversions) 

•  From  the  environment  side  (3)  -  by  process  improvement  (reengineering). 


Fig.  1 .  The  information  system  in  real  Fig.2.  Modified  environment 
environment.  for  IS  implementation. 


From  experiences  made  at  various  implementations  we  can  say,  the  best  result  is 
obtained,  if  there  is  implemented  standard  IS  with  as  few  modifications  as  possible.  The 
internal  flexibility  of  IS  is  used  against  environment  influences.  Another  possibility  to  avoid 
environment  influences  is  to  change  the  hierarchical  structure.  This  leads  to  more  clear  and 
simple  information  process  flow.  This  is  shown  on  Fig.  2.  The  vertical  borders  4  are  replaced 
by  the  horizontal  borders  10,  which  direction  is  similar  to  the  data  flow  trajectory  5.  The 
horizontal  lines  show  split  of  the  whole  environment  to  fully  or  partly  independent  processes. 
In  this  case  is  an  internal  functionality  of  the  IS  influenced  only  by  the  borders  to 
surroundings.  These  principles  are  in  practice  used  for  instance  in  quality  management  of  the 
production  processes. 

The  quality  verification  of  the  IS  implementation  is  usually  made  in  at  least  two  steps: 

•  Functional  verification  with  a  sample  data  load  -  functional  connections  among  all  parts 
are  tested  in  this  step 

•  Functional  verification  with  regard  to  foil  data  load  and  real  environment  -  process 
throughput  capacity  is  tested. 

Results  from  the  verification  processes  are  usually  used  for  making  corrections  in  IS  internal 
structure  or  modification  of  interfaces. 
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III.  Functional  specification  of  the  microelectronic  system 

The  design  of  microelectronic  system  (MS)  is  a  multi-stage  process.  Highest  level  is 
the  system  functional  specification  (behavior  of  the  system)  which  is  often  simplified  as  a 
“customer-supplied  specification'*.  Then  the  MS  is  described  at  a  behavioral  level.  In  the 
successive  design  steps,  the  description  is  refined  to  impose  more  structure  into  the  design 
with  the  filling  in  of  the  implementation  details.  At  each  step  of  the  refinement  process  the 
behavior  of  the  refined  design  is  verified  against  the  preceding  design. 

A  typical  top  down  design  methodology  would  start  with  the  design  description  in  a 
hardware  description  language  (HDL).  The  principal  feature  of  HDL  is  the  capability  to 
describe  the  function  of  a  piece  of  hardware  independently  of  the  implementation.  The  great 
advance  with  modern  HDLs  was  the  recognition  that  a  single  language  could  be  used  to 
describe  the  function  of  the  design  and  also  to  describe  the  implementation.  This  allows  the 
entire  design  process  to  take  place  in  a  single  language  and  a  single  representation  of  the 
design.  The  description  in  HDL  is  compiled  into  a  gate  level  implementation  by  a  hardware 
compiler.  The  gate  level  design  is  optimized  using  combinational  and  sequential  optimizers. 
The  functionality  of  the  implementation  is  checked  by  some  verification  or  simulation  tools. 
The  gate  level  design  is  mapped  into  transistors  using  technology  mapping.  The  transistor 
level  design  is  further  simulated  to  check  for  critical  path  analysis.  And  finally  transistors  are 
mapped  into  rectangles  using  placement  and  route  tools. 

All  above  mentioned  steps  are  currently  supported  by  various  CAD  systems.  For  most 
complex  MS  is  only  open  the  question  coverage  of  the  required  functionality.  This  is  the 
highest  level  above  the  behavioral  level.  Let  us  look  closer  to  this  problem.  If  the  system  is 
described  by  a  descriptive  tool,  for  instance  VHDL,  there  exists  number  of  exact  tools  to 
transform  it  down  (to  lower  levels).  To  obtain  the  descriptive  form  means  to  include  all 
application  requirements  (including  all  not  exact  defined)  and  specify  them.  Expected  is  the 
full  functional  coverage,  i.e.  the  functional  specification  may  include  over-covering  which 
may  be  used  as  spare  for  non-specified  application  requirements.  In  recent  MS  are  these  self¬ 
test,  self-calibration,  error  detection  and  internal  compensations.  Important  step  to  confirm  the 
functional  coverage  is  the  functional  verification  from  the  description  level,  i.e.  the  functional 
simulation  at  highest  level.  In  complex  systems  with  huge  input  data  combinations,  formal 
verification  methods  are  used  by  which  is  tested  validity  of  the  systems  functional 
specification,  not  functionality  of  the  system  itself.  The  direct  functional  simulation  of 
complex  systems  is  usually  not  possible  due  to  very  long  simulation  time. 

Let  us  try  to  validate  the  knowledge  from  previous  chapter  toward  the  methodology  of 
the  MS  functional  specification.  The  basic  parts  of  the  MS  functional  specification  are: 

•  Making  clear  borders  of  the  MS  functional  area  (extracting  the  MS  functionality  from 
the  whole  application) 

•  Definition  of  an  interface  to  the  environment  (communication  resources  and 
communication  methods) 

•  Definition  of  an  input  and  output  data  flow  (communication  protocols  and  data 
structure) 

The  structure  of  these  parts  is  basically  similar  to  those  at  the  IS.  In  practical  reality, 
often  appears  the  question  of  completeness  of  the  MS  functional  specification  caused  by  not 
sufficient  analyze  of  all  important  influences  from  area  of  the  application.  The  functional 
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specification  should  cover  for  instance  the  human  environment  influences  to  the  MS.  From 
this  point  of  view  we  may  see  two  groups  of  MS: 

•  Systems  cooperating  only  with  other  MS 

•  Systems  with  interface  to  human  operator. 

If  in  the  first  case  the  specification  of  the  interface  and  data  flows  is  usually  clear  (in  case  of 
too  complex  systems  even  this  might  not  be  true),  then  in  the  second  case,  we  must  consider 
the  factor  of  instability  caused  by  the  human  operator  characteristics.  At  this  point  is  often 
considered  user  friendness  (simplicity,  clearness)  to  the  operator  and  resistance  of  the  system 
to  mistakes  at  controlling  and  using  it. 

Solution  methods  to  above  mentioned  problems  are  today  mostly  empirical  and  there 
does  not  exist  any  formal  theory.  The  formal  specification  methods  has  been  developed  only 
for  some  specific  types  of  MS  (computers,  communication  systems,  etc.)  with  mostly  consist 
of  relatively  simple  and  clear  internal  structure. 

From  positioning  of  IS  point  of  view  is  possible  to  use  for  design  of  MS  data  flow 
analysis  and  its  distortion  by  various  factors.  If  MS  should  be  conform  with  application 
environment,  it  is  necessary  define  the  interface  to  this  environment  unambiguously.  Internal 
functionality  of  the  system  should  over-cover  required  functionality.  It  is  necessary  to  count 
with  some  extra  functionality  as  a  foresight  for  future  needs,  for  example  by  mean  of 
parametrizing  not  exactly  described  functionality.  The  definition  should  be  flexible  and  open 
to  future  extensions  from  this  point  of  view.  We  may  learn  from  useful  reorganization  of 
hierarchical  structure  at  IS  implementation  and  try  to  apply  it  to  optimizing  the  internal 
structure  of  MS.  It  may  be  used  in: 

•  split  MS  to  independent  and  partly  independent  processes. 

•  simplification  of  internal  structure  of  MS. 

•  simplification  of  interface  between  MS  and  environment  by  moving  it  to  more 
advantageous  position. 

Split  of  MS  to  more  simple  parts  lead  to  more  efficient  design  process,  more  clear  functional 
coverage  and  better  conditions  for  verification  processes. 


IV.  Conclusion 

We  presented  possibility  to  share  knowledge  from  two  similar  areas.  At  certain 
conditions,  for  the  functional  description  of  microelectronic  systems  are  used  the  same  rules 
and  methods  as  for  functional  definition  of  the  information  systems.  As  complexity  of  both 
systems  grows,  the  methodology  of  their  functional  specification  becomes  similar. 
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Abstract.  This  paper  deals  with  validation  of  VHDL  descriptions  at  the 
early  phase  of  the  design  of  a  digital  system.  Our  approach  consists  in 
generating  test  data  from  a  given  VHDL  behavioral  description.  The 
validation  is  achieved  by  comparing  the  results  obtained  using  the 
simulation  of  the  VHDL  description  within  the  test  data  and  the  results 
which  should  have  been  obtained  from  the  specification  of  the  system  to  be 
designed.  In  this  paper  we  propose  an  original  approach  based  on  software 
testing  concepts. 


I.  Introduction 

We  propose  in  this  paper  an  original  approach  based  on  software  testing  concepts[l]  in 
order  to  validate  behavioral  VHDL[2]  descriptions.  We  choose  such  an  approach  because  a 
VHDL  description  is  a  software  program  describing  the  behavior  of  a  digital  system.  The 
problem  is  to  generate  test  data  using  software  engineering  concepts.  Generating  test  data 
points  out  the  resolution  of  three  basic  problems  :  (i)  it  is  necessary  to  define  the  number  of 
test  data  to  be  considered  (this  number  is  called  the  length  of  the  test  data  in  the  following)  ; 
00  it  is  necessary  to  define  criteria  which  express  the  “quality”  requirements  that  the  test  data 
have  to  fulfill  ;  (iii)  it  is  necessary  to  define  an  algorithm  allowing  to  generate  test  data.  To 
solve  these  problems  we  are  concerned  with  testing  techniques  developed  in  the  field  of 
software  enginccring[l]t  This  interest  is  motivated  by  the  fact  that  behavioral  hardware 
languages  such  as  VHDL  and  conventional  languages  such  as  C  or  ADA[3,4]  are  supported 
by  common  concepts.  Having  selected  criteria  from  the  field  of  software  testing  allowing  the 
three  aforementioned  problems  to  be  solved,  we  are  in  the  phase  of  studying  how  such  criteria 
could  be  measured  and  applied  to  VHDL  behavioral  descriptions.  In  order  to  find  criteria 
which  could  estimate  the  length  of  test  data  and  express  the  quality  of  test  data,  we  have  been 
concerned  with  two  kinds  of  techniques:  (i)  the  computation  of  cyclomatic  complexity  metric 
(McCabe  metrics[4])  and  (ii)  the  application  of  coverage-based  metricsfl].  The  generation  of 
test  data  is  based  on  a  powerful  algoridim[5]  issued  from  software  testing  methods  :  this 
algorithm  allows  to  calculate  the  minimum  number  of  control  flow  paths  which  are  sufficient 
for  the  generation  of  all  possible  execution  paths  involved  in  a  VHDL  description.  The 
McCabe  metric  and  the  previous  algorithm  are  based  on  a  graphical  representation  of  the 
control  part  of  the  software  being  tested.  McCabe  defined  a  cyclomatic  number  of  a  graph 
associated  with  the  control  part  of  software.  This  number  represents  die  number  of  linearly 
independent  paths  of  the  graph.  He  proved  that  the  cyclomatic  number  represents  the 
minimum  number  of  test  data  to  be  generated  in  order  to  test  die  control  part  of  software.  In 
order  to  evaluate  the  quality  of  test  data,  conventional  software  testing  criteria  are  used.  These 
criteria  correspond  to  coverage  based  metrics[6]. 

The  first  part  of  the  paper  will  deal  with  the  graphical  representation  of  the  control 
flow  graph  (CFG)  of  a  VHDL  description  we  have  chosen.  We  will  present  in  detail  in  a 
second  part  the  cyclomatic  complexity  concept  and  the  coverage  based  criteria.  This  part  will 
be  illustrated  by  pedagogical  examples.  In  uie  third  part  we  will  present  the  algondim  we 
select  for  generating  the  test  data  from  a  set  of  independent  paths.  In  the  last  part  we  will 
present  how  these  previous  concepts  are  used  in  the  case  of  VHDL  descriptions.  Furthermore 
we  will  give  a  brief  overview  of  future  work  we  envision  to  perform. 
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n.  Graphical  Representation  of  Control  Flows 

In  this  part  we  briefly  present  how  the  control  flow  part  of  a  VHDL  description  is 
modeled  using  graph  concepts.  These  concepts  are  used  :  (i)  to  represent  the  sequencing  ot 
operations  involved  in  a  VHDL  description  and  (ii)  to  point  out  the  different  execution  paths. 
CFGs  describe  the  structure  of  software  modules.  The  definition  of  a  module  is  language 
dependent.  In  general,  it  is  a  unit  of  code  with  an  entry  point  and  an  exit  point.  Each  CFG 
consists  of  nodes  and  edges.  The  nodes  represent  computational  statements  Edges  represent 
transfer  of  control  between  nodes.  This  graphical  description  helps  to  understand  complex 
algorithms.  The  general  structure  and  the  CFG  of  a  VHDL  module  is  given  in  figure  1 . 


ARCHITECTURE  BEHAVIOR  OF  circuit_name  IS 
A(SOURCE) ,  B,  C,  Y  :  BEGIN 

PROCESS  1 :  PROCESS  WAIT  ON  (signal, ..) 

“Instructions  describing  the  behavior” 

END  PROCESS 

PROCESS  2  :  PROCESS  WAIT  ON  (signal, ..) 

“Instructions  describing  the  behavior” 

END  PROCESS 

PROCESS  3: 

T(sink) :  END  BEHAVIOR 


source  (A 


Declarations  of 
processes 


Sink  (Z 

Fig  1  General  structure  and  CFG  of  a  module  in  VHDL  code 

A  module  is  declared  with  the  key  word  ARCHITECTURE  BEHAVIOR.  The  processes 
involved  in  the  behavioral  description  are  executed  in  a  concurrent  way.  The  execution 
involves  three  kind  of  phases  :  (i)  the  scanning  phase  allowing  to  detect  which  processes  are 
going  to  be  activated  ;  (ii)  the  process  activation  allowing  the  execution  of  the  statements 
belonging  to  the  active  processes  ;  (iii)  the  execution  of  the  cpntrol  structure  (assignment, 
repetitive  or  selective)  involved  in  the  VHDL  description  of  a  given  process.  The  key  words 
BEGIN  (A,  B,  C  and  Y  nodes)  and  END  BEHAVIOR  (Z  node)  define  the  beginning  and  die 
end  of  a  behavioral  VHDL  description.  The  A  node  is  called  the  source  node  of  the 
description.  It  has  no  incoming  edge  and  only  one  outgoing  edge  which  leads  to  the  B  node. 
The  B  node  is  used  to  model  the  scanning  phase  :  (i)  if  at  least  one  of  the  processes  involved 
in  the  description  is  active  then  the  following  node  in  the  execution  path  will  be  the  C  node  ; 
(ii)  in  the  contrary,  the  node  which  follows  node  B  in  the  execution  path  is  the  Z  node  (see 
figure  1)  The  C  node  is  the  process  distributor  node.  It  allows  to  point  out  all  the  active 
s  *  processes.  The  number  of  outgoing  edges 

rs  ~  /V  of  the  C  node  is  equal  to  the  number  of 

Y  An  CJl  the  processes  in  the  description.  The  Y 

A  AAA  /  \  \  node  is  the  process  junction  node.  It 

O  UUCJ  fS  vJ  I  represents  the  end  of  the  execution  of  the 

T  \I/  \  /  J  active  processes  of  the  description.  The 

O  Yjr  Vx  Cr  number  of  incoming  edges  of  the  Y  node 

*  •  ,  iterative  is  equal  to  the  number  of  the  processes  in 

=-  . . 

Fig  2  Control  flow  sub-graphs  This  outg0ing  edge  allows  to  model  the 

fact  that  the  scanning  phase  is  going  to  happen  once  again.  The  Z  node  is  the  sink  node  of  the 
description  and  has  no  outgoing  edge.  Between  the  C  and  Y  nodes,  we  find  .the  nodes  that 
correspond  to  the  description  of  each  individual  process  and  that  allow  the  modeling  of  the 
execution  of  the  control  structures  (assignment,  repetitive  or  selective)  involved  m  the  VHDL 
description  of  a  given  process.  The  figure  2  shows  the  control  flow  sub-graph  of  such  control 
structures. 

HI.  Cyclomatic  Complexity  and  Structured  Testing 

Cvclomatic  complexity[7]  measures  the  amount  of  decisions  in  a  single  software 
module.  It  is  also  known  as  v(G),  where  v  refers  to  the  cyclomatic  number  in  graph  theory  and 
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G  indicates  that  the  complexity  is  a  function  of  the  graph.  Given  a  module,  whose  CFG  has  e 
edges  and  n  nodes,  its  v(G)  is  :  v(G)  =  e  -  n  +  2.  Considering  a  set  of  several  paths  gives  a 
matrix  in  which  columns  corresponds  to  edges  and  rows  correspond  to  paths.  From  linear 
algebra,  it  is  know  that  each  matrix  has  a  unique  rank  (number  or  linearly  independent  rows) 
that  is  less  than  or  equal  to  the  number  of  columns.  This  means  that  no  matter  how  many 
number  of  possible  paths  are  added  to  the  matrix,  the  rank  can  never  exceed  the  number  of 
edges  in  the  graph.  In  fact  the  maximum  value  of  a  rank  is  exactly  v(G).  A  minimal  set  of 
vectors  (paths)  with  maximum  rank  is  known  as  a  basis.  A  basis  can  also  be  described  as  a 
linearly  independent  set  of  vectors  that  generate  all  vectors  in  the  space  by  linear  combination. 
So  a  basis  is  the  minimum  number  of  paths  that  should  be  tested.  Therefore  v(G)  is  the 
number  of  paths  in  any  independent  set  of  paths  that  generate  all  possible  paths  by  linear 
combination.  Structured  testing  as  presented  in  this  sub-section  applies  to  individual  software 
modules.  It  is  simply  stated:  "Test  a  basis  set  of  paths  through  the  CFG  of  each  module".  This 
means  that  any  additional  path  can  be  expressed  as  a  linear  combination  of  paths  that  have 
been  tested.  This  criterion  establishes  a  complexity  number,  v(G),  of  test  paths  that  have  two 
critical  properties  :  (i)  a  test  set  of  v(G)  paths  can  be  realized  ;  (ii)  testing  beyond  v(G) 
independent  paths  is  redundantly  exercising  linear  combinations  of  basis  paths.  Therefore  the 
minimum  number  of  tests  required  to  satisfy  the  structured  testing  is  exactly  v(G).  Note  that 
structured  testing  criterion  measures  the  quality  of  testing,  providing  a  way  to  determine 
whether  testing  is  complete.  Structured  testing  is  more  theoretically  rigorous  and  more 
effective  at  detecting  errors  in  practice  than  other  common  test  coverage  criteria  such  as 
statement  coverage  and  branch  coverage[8].  It  is  not  a  procedure  to  identify  and  generate  test 
data  inputs.  The  independent  test  paths  can  be  identified  by  the  Poole’s  algorithm  described 
below. 


TV.  The  Poole’s  Algorithm :  A  Method  to  Determine  a  Basis  Set  of  Paths 


A  major  problem  in  unit  testing  of  programs  is  to  determine  which  test  data  are  to  be 
applied.  One  technique  that  is  in  widespread  use  is  to  take  the  CFG  from  each  of  the  program 
functions  and  calculate  a  basis  set  of  test  paths.  Path  construction  is  defined  as  adding  or 
subtracting  the  number  of  times  each  edge  is  traversed.  While  this  is  not  a  total  solution  for 
test  data  generation,  it  does  provide  a  good  starting  set  of  test  data.  Poole[5]  gives  an 
algorithm  for  taking  a  functions  CFG  and  determining  a  basis  set  of  paths.  Two  nodes  can  be 
either  unconnected,  connected  by  an  edge  in  either  direction  or  connected  by  an  edge  in  each 
direction.  When  tracing  a  path  from  the  source  to  the  sink,  a  back  edge  is  a  edge  that  leads 
back  to  a  node  that  has  already  been  visited.  For  example,  in  the  figure  1,  it  is  die  edge  that 
outgoes  of  the  Y  node  and  incomes  in  the  B  node.  A  CFG  contains  one 'source  node  and  one 
sink.  For  example,  consider  a  graph  with  4  edges:  a,  b,  c  and  d.  The  path  ac  can  be  represented 
by  the  vector  [1  0  1  0].  Paths  are  combined  by  adding  or  subtracting  the  paths’ vector 
representations.  Each  path  in  the  basis  set  can  not  be  formed  as  a  combination  of  other  paths 
in  the  basis  set.  Also,  any  path  through  the  CFG  can  be  formed  as  a  combination  of  paths  in 
the  basis  set.  Figure  3  shows  a  simplified  CFG.  While  a  complete  CFG  would  not  have  two 
edges  going  to  the  same  destination,  this  requirement  has  been  relaxed  to  keep  the  number  of 
paths  to  a  manageable  size  for  this  example.  A  basis  set  for  this  graph  is  {ac,  ad,  be} .  The  path 
bd  can  be  constructed  by  the  combination  bc+ad-ac  as  shown  in  figure  3.  The  set  {ac,ba}  is 
not  a  basis  set,  because  there  is  no  way  to  construct  the  path  ad.  The  set  {ac,ad,bd}  is  also  a 
basis  set.  Basis  sets  are  not  unique;  thus  a  flow  graph  can  have  more  than  one  basis  set. 


FindBasis  (node) 

if  this  node  is  a  sink  then  print  out  this  path  as  a 

solution 

else  if  this  node  has  not  been  visited  before 
mark  the  node  as  visited 
label  a  default  edge 
FindBasis  (destination  of  default  edge) 
for  all  other  outgoing  edges,  FindBasis 
(destination  of  edge) 

else  FindBasis  (destination  of  default  edge). 


Fig  3  Simplified  CFG  and  Demonstration  of  Path  Construction  and  Poole’s  Algorithm 
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The  algorithm  for  this  basis  set  method  is  a  modified  depth-first  search  algorithm.  The  search 
starts  at  die  source  node  and  recursively  descends  down  all  possible  outgoing  paths.  If  the 
node  visited  has  never  been  visited  before,  a  default  outgoing  edge  is  picked,  then  the  current 
path  is  split  into  new  paths  that  traverse  each  outgoing  edge,  going  down  the  default  edge  first. 
The  default  edge  is  any  edge  which  is  not  a  back  edge  or  which  later  causes  a  node  to  have 
two  incoming  edges.  For  example,  in  the  test  condition  of  a  pre-test  loop,  the  default  edge 
would  be  the  edge  which  exits  from  the  loop.  If  the  edge  that  traversed  the  body  of  the  loop 
was  chosen,  then  a  back  edge  from  the  last  node  in  the  body  to  the  test  condition  node  would 
have  to  be  traversed  later.  If  the  node  visited  is  a  sink  (no  exit  edges),  then  a  path  in  the  basis 
set  has  been  found.  Otherwise,  the  path  traverses  the  default  edge.  A  pseudo-code  outline  of 
this  method  is  shown  in  right  side  of  the  figure  3.  If  we  apply  this  algorithm  to  a  VHDL 
description,  our  result  will  be  a  set  of  independent  paths,  hi  fact  we  obtain  a  number  of  paths 
equal  to  the  cyclomatic  complexity  (see  section  2  and  3)  with  the  correspondent  edges  that  are 
traversed  for  each  path. 


V.  First  Results  and  Future  Work 


Using  the  concepts  issued  from  software  engineering  and  presented  in  section  3  and  4, 
it  is  obvious  that  theses  concepts  are  easily  used  on  the  control  flow  representation  of  VHDL 
descriptions  presented  in  section  2.  We  propose  in  figure  4  a  framework  for  deriving  test 

benches  for  VHDL 


VHDL  Code 


descriptions.  The  proposed 
approach  is  based  on 
McCabe  cyclomatic 

complexity,  the  structured 
testing  method  and  the 
definition  of  the  specification 
of  a  test  benches  generator 
using  the  Poole’s  algorithm. 
We  are  in  the  phase  of 
defining  the  specification  of 
a  software  allowing  to 
automatically  generate  test 
data  for  VHDL  descriptions 
from  die  previous  concepts. 
For  the  moment  we  have 


Fig  4  VHDL  behavioral  description  validation  scheme  generated  by  hand  input  data 

from  paths  of  a  simple 

VHDL  description  CFG  in  order  to  execute  each  path.  Furthermore  we  have  generated  their 
corresponding  output  data.  The  result  has  been  expressed  through  the  compilation  of  a  test 
bench.  Its  simulation  allowed  us  to  validate  a  simple  VHDL  behavioral  description^]. 
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Abstract.  Artificial  Neural  Networks  (ANN)  have  become  an 
interesting  solution  for  many  real  life  applications.  The  problems  related 
with  pattern  recognition,  data  clustering,  classification,  identification,...  are 
all  suitable  to  be  solved  using  neural  networks  methods.  Usually,  neural 
methodology  is  based  on  specific  models  and  algorithms.  Depending  only 
on  the  application  constraints,  it  is  possible  to  decide  how  to  implement 
them. 

Along  this  work,  we  will  try  to  make  clear  the  different  alternatives  for 
the  implementation  of  ANN,  going  from  the  use  of  commercial  software 
simulators  running  on  conventional  computing  platforms,  to  more  specific 
and  sophisticated  VLSI  electronic  implementations. 


I.  Introduction. 

Artificial  Neural  Networks  (ANN)  belong  to  the  category  of  intelligent  systems  able  to 
manage  information  in  a  way  resembling  more  or  less  the  nervous  system.  The  main 
properties  of  a  neural  computing  system  are  the  following: 

•  Ability  to  leam  from  examples  (no  prior  knowledge  about  the  problem  is 
required). 

•  Ability  to  generalise  from  learnt  examples. 

•  Ability  to  predict  with  new  data. 

•  Possibility  to  deal  with  missing  information  or  noisy  environments. 

ANN  process  data  in  a  massively  parallel  way,  involving  many  elementary  processors 
interconnected  through  different  paths.  The  high  intrinsic  parallelism  of  these  systems  is 
remotely  inspired  in  biology,  and  its  ability  to  process  information  is  done  using  specific 
algorithms.  These  algorithms  implement  learning  and  processing  phases  on  particular 
architectures,  each  of  them  composed  of  several  layers  of  processing  elements  with 
interconnection  paths  between  them. 

Figure  1  shows  such  a  structure  for  the  particular  case  of  the  Multilayer  Perceptron 
(MLP)  network.  In  this  ANN,  learning  procedure  is  done  by  using  the  Backpropagation  or  a 
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similar  algorithm.  The  process  consists  on  the  presentation  of  the  input  data  to  the  neural 
structure  followed  by  the  comparison  of  the  actual  known  answer  together  with  the  predicted 
one.  If  there  is  a  difference  between  the  two,  the  weights  are  adjusted  to  reduce  the  error.  This 
process  is  repeated  for  all  examples  until  the  best  solution  is  found. 


Typical  Neural  Network 


Fig  1.  MLP  structure  with  supervised  learning  scheme. 


The  above  learning  algorithm  is  a  good  example  of  the  so  called  Supervised  Learning , 
as  opposed  to  the  Unsupervised  Learning  one,  specially  employed  with  SOM  (Self-Organised 
Maps)  ANN. 

SOM  structure  is  mainly  composed  of  one  input  and  one  output  layers  with  a  two- 
dimensional  cluster  layer  of  interconnected  elements  between  them.  The  learning  process  is 
performed  by  the  presentation  of  data  to  the  input  layer  and  the  identification  of  the  elements 
in  the  cluster  layer  with  the  most  similar  profile.  Weights  are  adapted  with  a  re-enforce 
technique. 

After  many  iterations  of  the  presented  data,  the  structure  is  self-organised  in  such  a 
way  that  similar  inputs  force  similar  outputs  after  processing.  In  fact,  the  obtained  effect  is  a 
clustering  process  in  the  two-dimensional  layer.  To  use  the  SOM  network,  inputs  are 
presented  to  the  trained  ANN  and  the  resultant  location  on  the  cluster  layer  indicates  the  input 
group. 

Another  very  useful  ANN  model  is  the  RBF  (Radial  Basis  Function)  one.  It  consists  of 
two  layers.  The  first  one  (hidden  layer)  is  constructed  using  a  set  of  basis  ftmctions  whose 
centres  correspond  to  the  prototype  vectors  in  input  space.  The  basis  functions  are  usually 
chosen  to  be  un-normallized  Gaussians.  The  output  units  (second  layer)  form  a  linear 
combination  of  the  basis  functions  computed  by  the  hidden  units. 
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The  learning  process  in  RBF  ANN  is  not  completely  fixed,  and  takes  place  in  two 
main  phases.  First  of  all,  it  is  necessary  to  determine  the  position  of  Gaussian  functions  along 
with  their  shape  and  characteristics.  This  is  done  by  means  of  a  clustering  method,  together 
with  a  proper  parameter  value  selection  in  accordance  with  the  internal  organisation  of  the 
data  representing  the  problem.  These  parameters  will  vary  depending  on  the  data  density  in 
different  regions  of  the  input  space.  Setting  the  weights  in  the  second  layer  is  a  linear 
optimisation  task,  and  can  be  done  using  conventional  matrix  methods  or  iteratively  using  the 
Least  Mean  Square  (LMS)  algorithm. 


II.  Models  and  identification  of  operators. 


In  the  above  section  we  introduced  the  basic  models  for  ANN.  Usually  the  ANN 
application  is  limited  to  a  small  number  of  structures  and  associated  learning  algorithms.  By 
expanding  this  set  of  ANN,  it  is  possible  to  obtain  a  great  variety  of  solutions  with  the 
resulting  combination  of  the  particular  properties. 

The  set  of  ANN  structures  most  used  in  practice  is  the  following  [1]: 

•  Perceptron  (Single  Layer  Perceptron). 

•  Multilayer  Perceptron  (MLP). 

•  SOM  (Self-Organised  Maps). 

•  RBF  (Radial  Basis  Functions). 


The  associated  basic  strategies  for  training  and  execution  phases  are  listed  in  Table  I: 


ANN 

Learning 

Algorithm 

Operations  (learning  and  execution) 

Single  Layer  Perceptron 

Supervised 

Perceptron  learning  rule 

•  Product  and  Addition 

•  Non-linearity 

MLP 

Supervised 

Backpropagation 

•  Product  and  Addition 

•  Non-linearity 

SOM 

Unsupervised 

Re-enforcement 

•  Distance  calculation 

•  Product  and  Addition 

RBF 

Mixed 

Clustering 

•  Distance  calculation 

LMS 

•  Exponentiation 

•  Product  and  Addition 

Table  I.  Basic  ANN  structure  and  characteristics. 


Taking  a  look  on  the  table,  we  can  observe  that  only  a  few  operations  are  identified, 
and  depending  on  the  phase  (learning  or  execution)  the  required  precision  could  be  very 
different.  Usually,  a  precision  equivalent  to  8  digital  bits  is  enough  for  the  execution  phase, 
and  16  bits  may  be  necessary  for  training  process  when  an  error  function  optimisation 
procedure  is  used  (backpropagation  algorithm  for  example).  Summarising,  the  computational 
requirements  for  a  platform  implementing  an  ANN  are:  addition,  product,  non-linearity 
computation,  distance  calculation  and  exponentiation.  All  of  them  executed  in  a  collective 
way,  and  within  the  frame  of  a  given  structure  with  specific  data  paths. 
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III.  Technological  solution  for  ANN  implementation. 

In  1987,  Lippman  published  a  fundamental  review  article  on  the  field  [2].  He  wrote: 
"the  greatest  potential  of  neural  nets  remains  in  the  high-speed  processing  that  could  be 
provided  through  massively  parallel  VLSI  implementations".  Nowadays,  this  is  no  longer 
prevalent  because  most  neural  networks  applications  are  developed,  tested  and  run  on  PCs  or 
standard  workstations  (in  fact,  serial-processing  machines).  This  is  possible  because  of  the 
present  microprocessor  possibilities  and  working  frequencies. 

From  a  general  point  of  view,  and  taking  into  account  the  electronic  technology  state- 
of-the-art,  we  can  summarise  that  there  exist  many  ways  for  real  ANN  implementation: 

•  Analogue  implementations 

•  Digital  implementations 

•  Stand-alone  neurocomputers  (with  custom  or  commercial  processors) 

•  Accelerator  boards  (with  custom  or  commercial  processors) 

•  Chips  with  on-chip  controllers. 

•  Microprocessor  peripherals 

•  Stand-alone  on-board  applications. 

In  the  literature  we  can  find  many  excellent  works  with  accurate  reviews  on  hardware 
for  ANN,  especially  for  the  digital  alternative  [3,4,5].  The  analogue  implementations  are 
reviewed  in  [6],  with  an  extension  to  mixed  (analogue/digital)  solutions.  Table  II,  taken  from 
[6]  shows  the  digital  vs.  analogue  comparison  in  terms  of  main  merit  factors. 


Merit  factor 

Digital 

Analogue 

Area 

* 

Power 

* 

Sneed 

* 

Memory 

* 

Difficult 

Noise 

Quantisation 

Random 

Technology  independence 

* 

Repeatability 

* 

Scalability/Modularity 

* 

Precision/Dynamic  range 

Limited  by  area 

Tech,  and  design 

External  interface 

Numerical  data 

Physical  signals 

Table  II.  Digital  vs.  Analogue  neural  processing  comparison. 

Analogue  is  very  attractive  in  terms  of  area,  speed  and  power  consumption.  Because 
of  these  conditions,  analogue  can  be  considered  a  suitable  candidate  for  high-speed  and  low- 
power  applications.  On  the  other  side,  digital  solutions  benefit  from  reliable  and  well- 
stablished  memory  cells,  good  technology  independence,  repeatability  of  the  operations,  and 
easy  scalability  and  modularity. 
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External  real  world  physical  signals  interfacing  is  easier  in  analogue  domain,  but 
synchronism,  control  and  communication  processes  are  more  straightforward  for  digital 
processors.  The  most  suitable  real  world  implementation  or  realisation  of  an  ANN  application 
will  depend  on  many  conditions.  Figure  2  summarises  this  concept. 


Real  problem 


Database 


Number  of  samples 

Dimension 

Preconditioning 


ANN  model 


Solution 


Real  external 
constraints 


Size 

Power  consumption 
Real  time 


Structure 


% 


Learning  and 
execution 
algorithms 


Fig  2.  Map  for  real  ANN  implementation. 


When  considering  a  real  world  problem,  described  using  a  database,  and  after  a  proper 
selection  of  the  most  suitable  ANN  model,  it  is  possible  to  take  into  account  the  real  external 
constraints  for  the  application,  and  decide  the  best  solution. 

In  our  opinion,  the  most  restrictive  conditions  are: 

•  Size  and  power  consumption.  When  these  conditions  are  dominant,  it  will  be 
necessary  to  consider  analogue  or  mixed  solutions.  The  only  problem  could  be  here 
the  learning  process  and  the  information  storage  strategies. 

•  Real-time.  In  real  world  problems,  this  condition  could  be  very  important,  and  it  is 
possible  to  satisfy  it  in  a  great  variety  of  ways.  In  fact,  "real-time"  meaning  is 
application  dependant,  and  it  can  range  from  nanoseconds  to  seconds  or  minutes. 
The  concept  can  be  very  different  from  one  application  to  another. 

Nowadays  it  is  possible  to  obtain  real  time  characteristics  with  general-purpose 
microprocessors  (like  Intel,  SUN  or  Motorola)  because  their  working  frequency,  managing 
memory  possibilities  and  input/output  interface  properties  [7]. 
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Platforms  based  on  these  processors  are  very  good  solutions  for  ANN  implementation. 
Additionally,  we  will  need  a  software  package  for  the  proper  emulation  of  our  ANN  model. 
Software  must  have  the  right  data  managing  possibilities. 

The  criteria  for  selection  of  the  platform  [1]  will  be  price,  availability,  ability  to  run  the 
chosen  software,  and  computing  power  required  for  training  process.  We  must  ensure  that; 

•  It  has  sufficient  processing  power  to  cope  with  the  computations  required  during 
ANN  training. 

•  It  has  the  disk  capacity  to  hold  the  expected  volumes  of  data,  including  both  the 
training  data  and  the  network  parameters  (weights  and/or  centre  positions  and 
widths). 

•  It  has  any  special  interface  hardware  required  during  the  data  collection  phase. 

For  the  software  packages  themselves,  it  is  possible  to  consider  the  use  of  a 
commercial  or  public  domain  solution  [8],  or  alternatively,  to  write  one  s  own  software 
according  with  specific  needs. 

If  the  requirements  match  with  the  use  of  development  software  on  a  general  purpose 
processing  platform,  this  can  be  the  best  solution  for  implementation.  The  main  reasons  are. 

•  It  is  an  open  solution. 

•  The  time  required  until  the  solution  is  low. 

•  It  is  flexible.  • 

•  It  accepts  a  continuous  updating. 

•  The  user  interface  is  optimal. 


IV.  Specific  approaches  and  new  possibilities. 

When  the  requirements  of  a  specific  problem  overcome  the  global  possibilities  of 
general-purpose  platforms  in  terms  of  speed  and  adaptation  to  a  specific  ANN  structure,  it 
could  be  necessary  to  think  on  Special-purpose  hardware  solutions.  As  it  was  stated  in  above 
section,  we  have  two  main  families  of  solutions; 

•  Accelerator  systems,  available  for  standard  PCs  and  workstations  and  based  on 
general  purpose  microprocessors  or,  alternatively,  on  special  VLSI  custom 
processors  well  adapted  for  the  execution  phase  of  some  ANN  algorithms.  In  the 
literature  [3]  we  can  find  examples  of  such  a  realisations:  ANZA  plus  from  Hecht- 
Nielsen  Co.,  board  based  on  Lneuro  chip  from  Philips,  accelerators  around  MA16 
chip  from  Siemens,  ZISC  board  from  IBM-France, ... 

•  Special-purpose  systems ,  generally  based  on  specific  VLSI  designed  neurochips. 
Some  well  known  examples  are  [3,4]:  the  SYNAPSE- 1  machine  from  Siemens,  the 
CNAPS  engine  from  Adaptive  Solutions  Inc.,  MANTRA  from  EPFL  in 
Switzerland,... 
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Special  purpose  systems  are  very  expensive,  and,  generally,  they  present  some  problem 
with  the  user-interface.  In  conclusion,  we  have  serious  doubts  about  their  usefulness  except  in 
very  specific  cases  with  very  strong  boundary  conditions. 

Looking  at  new  possibilities  based  on  actual  electronic  technology  progress,  a  new  and 
promising  scenario  is  offered  by  electronic  reconfigurable  systems  around  FPGAs  (Field 
Programmable  Gate  Arrays)  devices.  The  capabilities  of  the  coming  devices  from  many 
manufacturers,  like  Xilinx  and  Altera,  allow  the  realisation  of  systems  equivalent  to  thousands 
of  logical  gates.  It  is  easy  and  relatively  fast  to  implement  operations  related  with  ANN 
computation  on  a  chip,  giving  the  additional  opportunity  to  implement  different  ANN 
structures  on  the  same  basic  hardware  (reconfigurability  properties). 

One  recent  example  of  these  possibilities  is  the  solution  proposed  by  M.Chiaberge  [9] 
from  Politecnico  di  Torino.  The  OpenDSP  system  is  able  to  implement  several  ANN  models, 
and  it  is  based  on  a  Texas  DSP  processor  together  with  a  FPGA  device  from  Altera.  The 
complete  system  is  composed  of  several  boards  connected  by  means  of  a  proprietary  bus. 


Fig  3.  Overview  of  OpenDSP  board  (Politecnico  Torino) 

The  internal  configuration  of  FPGAs  devices  with  limited  routing  resources  forces  the 
consideration  of  special  solutions  for  implementing  arithmetic  operators,  like  On-line 
arithmetic  [10].  These  specific  design  techniques  together  with  the  possibilities  offered  by  the 
very  new  dynamically  reconfigurable  devices,  integrating  the  FIPSOC  family  (Field 
Programmable  System  on  a  Chip)  [11,12],  give  us  the  opportunity  for  compact  solutions  in 
the  field  of  ANN  models  realisation.  FIPSOC  is  a  special  system-on-a-chip  composed  of  one 
digital  FPGA  part  (equivalent  to  5000  gates),  a  microcontroller  core  (standard  8051)  and  an 
analogue  part  (fully  differential  amplifiers,  comparators,  8  bit  A/D  -  D/A  converters).  Figure 
4  shows  the  organisation  of  the  device. 

V.  Conclusion. 

This  paper  has  discussed  different  current  ways  for  the  implementation  of  ANN 
structures  for  real  world  applications  domain.  Special  attention  must  be  paid  to  the 
characteristics  of  present  commercial  platforms  based  on  novel  microprocessor  devices. 


Fig  4.  Bloc  diagram  of  FIPSOC  device. 
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Abstract.  This  paper  presents  a  novel  mixed-signal  VLSI  focal-plane  array 
processor  performing  the  morphological  operation  of  skeletonization  of  binary 
images  in  real  time.  The  chip  exploits  the  inherent  parallelism  of  the  grassfire 
algorithm  by  using  a  massively  parallel  mixed-mode  array  achieving  high 
computational  speed  and  relatively  low  power  consumption.  The  system  is  a  two- 
dimensional  array  of  identical  processing  elements  (PEs)  with  integrated 
photo  detectors  allowing  parallel  optical  input.  The  skeleton  computation  is 
performed  in  a  fully  parallel,  asynchronous  manner  by  local  interactions  between 
PEs,  the  transformed  image  is  represented  by  analog  values.  The  output  of  the 
transformation  can  be  serially  accessed  with  an  on-chip  raster- scan  circuit 
generating  multiscanning-compatible  video  signal.  The  system  was  physically 
designed  and  prototypes  of  22x26  array  were  fabricated  in  a  standard  0.7  jam 
CMOS  process.  System  and  layout  design  considerations  as  well  as  output  data 
sample  of  the  fully  functional  fabricated  chip  are  presented. 


I.  Introduction 

Our  efforts  on  the  VLSI  implementation  of  a  skeletonization  system  are  motivated  by  the 
demand  for  high  computational  throughput  at  low-power  and  small  dimensions  [1].  The 
skeletal  representation  compresses  topological  information  of  shapes,  and  has  been  recognized 
as  a  useful  pre-processing  step  in  image  recognition  tasks  [2].  Results  based  on  psychophysical 
measurements  suggest  that  the  human  visual  system  extracts  ‘skeletons’  as  descriptor  of  planar 
objects  [3].  We  have  chosen  a  low-cost  standard  digital  CMOS  process  to  implement  a 
skeletonization  chip. 

The  skeletonization  algorithm  developed  by  us  is  based  on  the  grassfire  method  proposed 
by  Blum  aimed  at  eliciting  the  Symmetric  Axis  Transform  (SAT)  of  planar  shapes  [4],  Our 
method  produces  a  greyscale  skeleton  which  is  a  generalization  of  the  traditional  binary  one. 
This  ‘generalized  skeleton’  can  be  extracted  by  monitoring  an  activation  spreading  process  - 
the  ‘grassfire’  -  implemented  on  a  multilayer  network.  The  inherent  parallelism  of 
skeletonization  by  the  grassfire  method  can  be  exploited  by  building  a  VLSI  system  consisting 
of  a  2D  array  of  identical  processing  elements  collectively  processing  the  input  data.  A 
dedicated  parallel  VLSI  chip  can  solve  the  generalized  skeleton  computation  in  a  few 
microseconds  at  low  power  consumption,  in  a  small  volume.  Analogue  processor  arrays  for 
basic  morphological  operations  have  been  already  presented  by  others  [5]. 
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II.  A  Neural  Model  of  the  Grassfire  Transformation 

Detector  Layer 
Detector  Connections 
^  -Lateral  Connections 
Intemeurons 
Hexagonal  Neural  Grid 

Spatial  Filter  Connections 
Orthogonal  Pixel  Grid 

Fig  1  Architecture  of  the  grassfire  transformation 

The  architecture  of  our  grassfire  transformation  model  is  a  multilayer  network  (Fig.  1). 
The  algorithm  of  the  transformation  consists  of  three  main  steps:  (i)  set  the  initial  grassfire 
activation  on  the  neural  grid  according  to  the  figure-ground  separated  input  image,  (ii)  let  the 
grassfire  activity  propagate  from  the  boundary  points  of  the  shape  along  the  lateral  connections 
between  adjacent  neurons,  (iii)  detect  the  incoming  and  outgoing  activity  to/from  each  neuron. 
Constant  propagation  speed  of  grassfire  wavefront  can  be  achieved  by  the  self-excitatory 
behaviour  of  the  neurons.  To  monitor  the  propagation  process  to  each  lateral  connection  a 
processing  unit  (or  interneuroii)  is  assigned.  The  interneural  activations  are  summed  up  and 
time-integrated  on  the  output  detectors.  Those  points  where  at  least  two  firefronts  collided  are 
skeleton  (symmetry)  points,  since  they  are  equidistant  from  two  boundary  points.  Skeleton 
points  are  marked  by  positive  detector  values,  because  the  sum  of  activity  inflow  is  greater 
than  the  outflow.  A  single  passing  firefront  results  in  zero  detector  value,  since  activity  in-  and 
outflow  cancel  each  other. 

After  saturation  of  the  grassfire  activity  over  the  neural  grid,  the  activity  distribution  over 
the  detector  layer  carries  the  output  of  the  transformation  (i.e.,  the  generalized  skeleton). 
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The  implemented  VLSI  system  is  an  array  of  identical  PEs  consisting  of  four  functional 
layers  of  circuitry  (see  Fig  2).  The  PEs  are  arranged  on  a  hexagonal  grid  with  local  non-linear 
type  connections.  Replacement  the  linear  lateral  connections  with  hard  limiting  ones  causes 
rotational  dependence  of  the  computed  skeleton,  but  it  does  not  influence  system  level 
functionality  and  allows  much  more  efficient  VLSI  implementation.  The  photodetector 
circuitry  senses  and  binarizes  the  image  focused  onto  the  chip  to  figure  and  ground  points  via  a 
single  thresholding.  The  initial  grassfire  activation  distribution  is  set  as  high  and  low  voltages 
on  a  capacitor  array  integrated  in  the  processing  circuitry  according  to  the  thresholded  image. 
The  processing  circuitry  propagates  the  ‘grassfire’,  the  detector  circuitry  monitors  the 
propagation  process  and  encodes  the  resulted  skeletonized  image  as  voltage  distribution  over  a 
capacitor  array.  The  output  of  the  transformation  can  be  serially  accessed  by  an  on-chip 
scanner  circuitry  [6]  enabling  real-time  visualization  on  a  commercial  VGA-compatible 
computer  display. 

IV.  Physical  Design 

The  PE  cell  as  well  as  the  necessary  input/output  interfaces  to  the  processing  array  were 
physically  designed  using  a  standard  digital  0.7  pm  dual  metal  CMOS  process  (ES2  ecpd07). 
The  PE  cell  is  layed  out  in  a  rectangular  area  (approx.  120x140  pm)  with  an  aspect  ratio  of 
approximately  V5/2  to  build  up  the  required  hexagonal  tessellation  of  the  plane.  The  entire 
surface  except  over  the  photosensing  device  is  covered  with  the  second  metal  layer  (see  Fig  3) 
in  order  to  protect  the  circuit  parts  susceptible  to  light  induced  currents  (e.g.  MOS  transistors 
operated  in  subthreshold  mode).  This  shielding  also  serves  as  power  routing. 

V.  Evaluation 

A  22  by  26  test  array  was  fabricated  using  Europractice  low-cost  Multi-Project-Wafer 
prototyping  service  consuming  approximately  20  mm2  silicon  area.  The  chip  package  is  closed 
with  a  transparent  lid  for  the  optical  image  projection.  The  original  binarized  input  image  and 
its  grey-scale  skeletonized  transform  is  serially  read  out  by  means  of  a  minimum  number  of  off- 
chip  components.  The  test  setup  is  shown  on  Fig  3.  Fig  4  shows  the  image  of  a  rectangle  and 
its  skeleton  as  computed  by  the  chip. 


Fig  3  (left)  Scanning  electron  microscope  photograph  of  the  PE  on  the  fabricated  chip 
around  the  photodiode,  (right)  Test  panel  for  chip  testing.  The  chip  is  surrounded  by 
biasing  potentiometers  and  a  video  buffer  (front  panel).  Image  is  projected  onto  the 
chip  with  a  camera  lens  (black  object  in  the  middle). 
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Fig  4  Image  of  a  rectangle  optically  projected  onto  the  chip  (left)  and  its  skeleton  as 
computed  by  the  same  VLSI  (right).  Visualization  on  a  standard  VGA  monitor, 
snapshot  taken  with  a  conventional  camera. 

The  chip  consumes  lOOmW  and  its  computational  power  efficiency  (excluding 
input/output)  is  0.25*  1012  Op/J. 

The  measurements  confirmed  that  the  chip  is  fully  operational  capable  of  computing  the 
generalized  skeleton  of  binary  images.  By  exploiting  the  inherent  parallelism  of  the  algorithm  in 
an  asynchronous  mixed-mode  implementation  an  estimated  processing  speed  of 
lOKframes/second  can  be  achieved.  The  combined  use  of  analog  and  digital  computation  at  the 
pixel  level  offers  compact  design  compared  to  pure  digital  or  analog  implementations. 

The  development  of  the  photoreceptor  circuit  is  now  under  consideration  in  order  to 
solve  the  figure-ground  segregation  problem  under  inhomogeneous  illumination  conditions. 
We  also  plan  to  investigate  the  possibilities  of  the  integration  of  a  skeletonization  array  with 
further  on-chip  processing  for  optical  character  recognition. 
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Abstract 

Although  we  live  surrounded  by  computers  and  other  digital  circuits, 
there  are  still  applications  that  require  some  analogue  parts.  And  here 
belongs  also  the  neural  networks  domain.  One  of  the  biggest  problems  is  to 
store  the  weight  voltage  in  analogue  form  with  a  desired  accuracy  and 
without  time  degradation.  This  article  introduces  a  method  to  solve  this 
problem. 


1.  Introduction 

As  follows  of  the  nature  of  Analogue  artificial  neural  networks,  they  work  on  analogue 
signals,  amplifying,  adding  them  to  get  the  desired  output.  They  have  to  memorize  the  weight 
of  each  synapse  in  some  way.  Depending  on  the  physical  quantity  they  use  different  methods. 
A  very  efficient  way  is  to  use  digital  circuits.  The  weight  can  be  stored  in  a  digital  memory 
and  reproduced  with  a  DAC.  The  accuracy  of  this  solution  is  dependent  on  the  resolution  of 
the  used  DAC,  which  can  be  up  to  24  bits,  but  this  solution  would  need  a  DAC  in  every 
synapse.  A  very  efficient  and  precise  method  is  to  use  a  sample  and  hold  circuit  in  every 
synapse  and  an  analogue  demultiplexer  at  the  DAC  output.  This  article  explains,  how  to  apply 
such  a  weight  storage  in  the  analogue  neural  networks. 


2.  The  digital  refresh 

Fact,  on  that  this  circuit  is  based  is  the  long  developed  method  of  storing  digital 
information.  The  time  degradation  is  negligible  in  this  type  of  memories,  because  the  read 
voltage,  stored  in  a  memory  cell  is  “compared  to  a  reference  value”  and  the  result  is  then  one 
bit  of  digital  voltage  of  log.L  or  log.H  level.  Several  bits  are  then  applied  to  a  digital  to 
analogue  converter,  which  generates  analogue  value  of  voltage.  The  DAC  with  a  latch  at  the 
input  holds  the  corresponding  voltage  at  its  analogue  output.  To  use  only  one  DAC  for  several 
synapses  it  is  necessary  to  add  an  analogue  demultiplexer  to  the  output  of  the  DAC  and  a 
sample  and  hold  circuit  to  each  synapse.  Fig.l  shows  the  block  schematic  of  the  refresh 
circuit.  The  system  consists  of  a  clock  generator,  an  address  generator,  binary  to  decimal 
decoder,  memory,  digital  to  analogue  converter  and  an  analogue  demultiplexer. 
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Fig  A:  Block  diagram  of  the  refresh  circuit 


3.  The  digital  subsystem 

The  main  part  of  the  digital  subsystem  is  the  memory.  The  number  of  bits  in  a  word 
depends  on  the  accuracy  of  the  digital  to  analogue  converter  and  the  sample  and  hold  circuits. 
It  can  be  8  to  24  bits  depending  on  the  used  technology  and  desired  accuracy.  The  number  of 
words  is  dependent  on  the  size  of  the  network  and  the  number  of  synapses  respectively,  and  it 
should  not  be  too  high  due  to  the  refresh  rate,  which  depends  on  the  used  sample  and  hold 
circuits  (quality  of  capacitors,  leakage)  and  the  technology.  If  the  sample  rate  is  too  low,  the 
voltage  at  the  output  of  the  S&H  circuit  is  not  stable  enough  and  the  precision  of  the  neural 
network  is  then  decreased.  It  happens  also  in  case,  when  there  is  a  high  bit  resolution  DAC 
used,  the  height  of  the  refresh  rate  is  already  limited  by  the  technology  and  there  is  a  lot  of 
synapses,  connected  to  the  circuit.  The  quantity  level  of  the  DAC  is  then  very  small  and  the 
voltage  on  the  S&H  capacitor,  decreasing  by  time  reaches  the  lower  weight  values. 

The  refresh  circuit  has  an  input  bus  for  programming  the  memory  and  connecting  more 
circuits  to  the  programmer,  so  there  can  be  used  more  refresh  circuits  on  one  neural  network. 
The  advantage  is  a  higher  refresh  rate,  more  stable  weight  voltage  and  more  accurate  response 
of  the  neural  network. 

The  main  advantage  of  this  solution  of  the  refresh  is  that  the  digital  subsystem  is  completely 
independent  of  the  neural  network,  so  it  can  operate  at  the  highest  possible  frequency  and  the 
system  is  very  easily  programmable. 

Another  important  part  of  this  subsystem  is  the  address  generator.  This  circuit  generates 
the  address  for  the  memory,  from  which  the  output  data  are  provided  to  the  DAC.  This 
generator  is  an  n-bit  continuous  counter.  The  number  of  bits  of  the  counter  and  the  width  of 
the  address  bus  depends  on  the  number  of  the  synapses. 


3.  The  analogue  subsystem 

The  analogue  subsystem  consists  of  the  digital  to  analogue  converter  and  the  analogue 
multiplexer,  which  is  actually  the  input  part  of  the  S&H  circuit. 

The  multiplexer  has  two  inputs:  the  analogue  input  and  the  digital  address  bus.  The 
analogue  input  is  connected  to  the  DAC  and  provides  the  output  voltage  to  the  capacitors. 
The  digital  address  bus  is  a  decimal  input  bus  that  handles  the  switching  of  the  analogue 
switches.  The  system  charges  the  capacitors  with  a  voltage  pulse  from  the  analogue  input.  The 
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refresh  rate  (see  equation  (1)  for /*,«,>,)  should  be  equal  or  less  than  the  time,  during  that  the 
voltage  of  the  capacitors  decreases  one  quantity  level  of  the  DAC. 


=  — r-ln 


A u 


+  1 


(1) 


where  Au  is  the  elementary  step  of  the  DAC, 

T  is  the  time  constant  of  the  capacitor, 
n  is  the  number  of  the  synapses 
and  uo  is  the  weight  voltage. 

The  multiplexer  does  not  have  any  built-in  intelligence  thus  the  binary-to-decimal  decoder 
should  care  about  to  provide  log.H  level  only  on  one  output  at  the  same  time.  The  simulation 
results  of  a  4  bit  decoder  are  shown  in  fig. 2. 


•  4MT.M  IUM.M  I7M1.0H  mil.n  I4DQI.M  mr.;;  )»««.« 


Fig. 2.:  The  output  of  the  decoder 


An  example  of  the  S&H  circuit  simulation  is  shown  in  fig.3.  A  small  number  of  synapses 
were  used  just  to  demonstrate  the  functionality  of  the  system.  The  quantity  level  of  the  8  bit 
DAC  with  3.3V  voltage  reference  was  approximately  0.0129V.  The  next  refresh  pulse  must 
come  before  the  voltage  of  the  capacitor  decreases  below  1.487V  (by  1.5V  weight),  see  fig.3. 
Unfortunately  the  parasitic  resistance  of  such  a  structure  was  not  known  to  us,  so  in  the 
simulation  we  first  used  a  dummy  NMOS  transistor  and  than  a  5GQ,  resistor.  The  result  is 
satisfying,  the  voltage  difference  was  0.004V,  three  times  smaller,  than  the  quantity  level  of  the 
DAC  by  300ps  (3.34  kHz)  refresh  rate. 


Fig.  3.:  Simulation  results  of  the  S&H  circuit 


Conclusions 

The  presented  circuit  can  be  used  to  keep  the  weight  values  also  by  existing  analogue 
neural  networks.  The  advantage  is  that  the  system  is  continuously  programmable,  allowing 
self-learning  capabilities  of  the  applications  with  e.g.  a  microcontroller.  The  clock  and  the 
programming  of  the  memory  do  not  affect  the  networks,  so  they  can  operate  without 
interrupts.  The  circuit  is  also  applicable  in  a  mixed  design  reducing  the  number  of  external 
parts.  The  experimental  circuit  is  designed  in  Solo  1400’s  1.0pm  technology,  the  chip  area  is 
approximately  4mm2.  This  work  has  been  supported  by  the  Ministry  of  Education  of  the 
Slovak  Republic  under  Grant  No.;  1/6096/99. 
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Abstract.  In  this  paper,  we  compare  two  radial  basis  function  neu¬ 
ral  network  (RBF  NN)  models  for  estimating  human  signal  detection  per¬ 
formance  from  brain  event-related  potentials  (ERP)  elicited  by  task  relevant 
signals.  Data  consist  of  ERPs  and  performance  measures  (PF1)  from  five  hu¬ 
man  operators.  Individual  RBF  NN  models  are  built  of  PF1  using  modified 
Minimum  Description  Length  (mMDL)  method.  The  results  (the  number  of 
hidden  units  and  the  quality  of  approximation)  are  compared  with  the  ones 
achieved  with  RBF  NN  models  which  were  designed  using  orthogonal  least 
sqaures  (OLS)  method.  We  arrived  at  a  conclusion,  that  using  the  mMDL 
method  it  is  possible  to  build  the  RBF  NN  with  substantially  lower  number 
of  hidden  units  but  at  the  expenses  of  slightly  worse  quality  of  aproximation 
as  with  the  OLS  method. 

1  Introduction 

In  many  important  tasks  (e.g.  air  traffic  control,  piloting  of  vehicles)  the  control  is  based 
on  ability  of  operator  to  detect  and  evaluate  task  relevant  signals  in  presented  visual  data. 
Performance  quality  of  the  operator  varies  over  time,  often  falling  below  acceptable  limits. 
Such  performance  variability  may  have  serious  consequences.  In  many  of  these  tasks,  the 
likelihood  of  such  errors  could  be  reduced  if  objective  methods  for  assessment  of  human 
performance  were  available.  We  have  utilised  ERP  data  to  build  the  RBF  NN  model 
for  estimating  human  signal  detection  performance.  The  ERPs  reflect  mental  processes 
and  are  known  to  be  related  to  human  performance,  including  signal  detection,  target 
identification  and  recognition,  memory  and  mental  computation  [6]. 

2  Data  set 

The  ERP  data  (which  were  recorded  from  3  electrodes)  were  acquired  in  earlier  study 
([6])  during  a  signal  detection  task  from  5  human  operators. 

Within  each  blok  of  trials  a  running  mean  ERP  was  computed  for  each  operator  using 
10-trials  moving  window  ([2]).  As  we  were  confronted  with  the  curse  of  dimesionality,  the 
principal  component  analysis  (PCA)  was  applied  on  the  running  mean  ERPs.  The  ERP 
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data  preprocessed  in  this  way  form  the  input  data  for  the  RBF  NN  models  for  estimating 
human  signal  detection  performance.  The  dimension  of  the  input  vector  for  individual 
operator  is  30  and  their  number  gets  values  between  400  and  900. 

The  performace  of  each  operator  was  measured  by  monitoring  three  parameters:  speed, 
accuracy  and  confidence.  The  performace  measure,  PF1  was  derived  using  factor  analysis 
of  the  performace  data  ([2]): 

PFl  =  0.33  *  Accuracy  +  0.53  *  Confidence  -  0.51  *  Reaction  Time  (1) 

Running  mean  PFls  (computed  in  the  same  way  as  runing  mean  ERPs)  form  the  output 
scalar  for  the  RBF  NN  models. 

3  RBF  neural  network  model 

An  RBF  NN  with  n  inputs  and  scalar  output  is  given  by 

m  "X 

F{x)  =  WQ  +  ^  Wjh( ||x  -  Cj||)  =  Wo  +  XI  w3e  Tj 
j=  1 

where  x  6  Kn  is  the  input  vector,  T{x)  is  the  output  of  the  RBF  network,  wj  (0  <  j  <  m ) 
are  the  weights,  /&(•)  is  given  radial  basis  function  (RB-function)  from  to  7Z  (we  have 
used  Gaussian  RB-function),  ||  •  ||  denotes  the  Euclidean  norm,  Cj  e  Hn  (1  <  j  <  m)  are 
the  centers  and  r,  6  Hn  are  the  widths  of  RB-functions  and  m  is  the  number  of  centers, 
and  thus  the  number  of  RB-functions  (number  of  the  hidden  units). 

The  design  and  training  of  RBF  NNs  consist  of  5  tasks:  the  choice  of  RB-function; 
determining  the  number  of  RB-functions;  finding  their  centers  and  widths  and  adapting 
the  weights([4]).  The  choice  of  RB-function  is  not  so  critical  for  performance  of  the  RBF 
NN  as  the  process  of  finding  their  centers  [1].  Thus,  the  main  question  is,  how  to  select 
appropriate  centers  from  the  training  set. 

4  Methods 

In  earlier  study  [2]  the  OLS  method  was  used  to  design  the  RBF  NN  models  for  esti¬ 
mating  human  signal  detection  performance.  The  OLS  method  ([1])  is  an  algorithm  for 
selecting  a  suitable  subset  of  input  vectors  as  the  centers.  The  center  selected  in  each  step 
reduces  the  error  of  the  network  at  most.  The  centers  are  selected  until  an  adequate  RBF 
NN  is  constructed.  There  is  no  guarantee  that  this  method  produces  the  smallest  RBF 
NN  for  a  given  quality  of  aproximation  ([5]).  Encouraged  by  preliminary  results  [3]  we 
decided  to  use  the  MDL  method  which  seems  to  produce  smaller  networks  and  preserve 
approximation  quality. 

The  mMDL  method  ([3])  involves  two  procedures:  adaptation  (training)  and  selection. 
The  adaptation  procedure  changes  the  locations  and  widths  of  the  RB-function  centers 
and  adjusts  the  weights.  The  selection  procedure,  according  to  the  Minimum  Description 
Length  (MDL)  principle,  selects  those  RB-functions  from  the  set  of  all  RB-functions,  that 
describe  the  training  data  with  the  shortest  possible  encoding.  In  other  words,  if  there 
are  more  RB-functions  that  cover  the  same  space,  those  RB-functions  should  be  selected 
that  describe  a  larger  portion  of  the  data  and  which  contribute  Iks  to  the  overall  error  of 
the  network.  More  specifically,  at  the  beginnig  all  samples  from  the  trainig  set  are  used 
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Oper. 

ERPs 

RBF  NNs  +  OLS 

RBF  NNs  +  mMDL 

NC 

mean  (std) 

Test  set  NMSE 
mean  (std) 

NC 

mean  (std) 

Test  set  NMSE 
mean  (std) 

A 

891 

387 

(56) 

0.163 

(0.030) 

61 

(3) 

0.306 

(0.081) 

B 

592 

171 

(42) 

0.119 

(0.028) 

74 

(1) 

0.139 

(0.040) 

C 

417 

175 

(26) 

0.231 

(0.050) 

44 

(3) 

0.316 

(0.094) 

D 

734 

249 

(109) 

0.080 

(0.020) 

82 

(7) 

0.107 

(0.030) 

E 

776 

249 

(60) 

0.175 

(0.025) 

76 

(4) 

0.240 

(0.059) 

Table  1 :  The  comparison  of  the  aproximation  errors  (NMSE)  achieved  at  the  test  set  and 
the  number  of  hidden  units  (NC)  between  RBF  NNs  using  OLS  and  mMDL  methods. 
The  values  represent  an  average  of  10  simulations  with  standard  deviation  in  parentheses. 


as  centers  for  the  RB-functions.  Such  a  network  behaves  like  a  look  up  table.  In  order  to 
achieve  the  generalization  it  is  allowed  the  network  to  adapt.  Since  the  consequence  of 
this  adaptation  is  usually  the  overlap  among  the  RB-functions,  it  is  necessary  to  eliminate 
redundant  ones  from  the  whole  set,  which  is  done  by  the  selection  procedure.  These  two 
procedures  are  repeated  until  no  more  RB-functions  are  removed  as  redundant  ones. 

5  Results 

Both  models  of  RBF  NNs  were  validated  using  10-fold  cross  validation  ([4]).  The 
simulations  (using  the  mMDL  method)  were  implemented  in  Matlab  using  the  package 
of  routines  provided  by  H.  Bishof.  The  quality  of  aproximation  was  measured  in  terms 
of  number  of  centers  (NC)  of  RB-functions  and  normalized  mean  square  error  (NMSE) 
defined  as 

£  (F-F(x”)y  P 

NMSE  =  ^-p -  where  f?  =  -  ]T  tp  (3) 

ZiP-V)2  Fp=1 

P~  1 

and  P  is  the  number  of  ERPs  (gathered  for  one  operator),  tp  is  the  target  output  value  and 
JF(xP)  is  the  output  value  of  the  network  for  the  p~th  input  vector  from  the  training  set. 

First,  let  us  look  at  the  number  of  hidden  units  (NC)  of  the  RBF  NN  models  which 
were  constructed  by  both  methods.  As  we  can  see  in  the  Table  1  better  results  are  achieved 
using  the  mMDL  method  (NC  is  on  average  3.8  times  lower  than  NC  achieved  using  the 
OLS  method).  Across  operators  the  average  NC  for  the  RBF  networks  +  mMDL  is  67 
which  is  less  than  246  for  the  RBF  NNs  -f  OLS.  Also  the  standard  deviation  for  the  RBF 
NNs  +  mMDL  method  (on  average  5.4%  from  the  NC)  is  less  than  for  the  RBF  NNs  + 
OLS  method  (on  average  24.4%  from  the  NC). 

Now,  let  us  look  at  the  quality  of  approximation  in  terms  of  NMSE.  NMSE  achieved 
at  the  test  set  with  the  RBF  NNs  +  OLS  is  lower  than  with  the  RBF  NNs  +  mMDL  (on 
average  1.424  times).  Across  operators  the  average  NMSE  for  the  RBF  NNs  +  OLS  is 
0.1536  and  for  the  RBF  NNs  4-  mMDL  is  0.2216.  And  also  the  standard  deviation  for 
the  RBF  NNs  +  OLS  (on  average  20.6%  from  MNSE)  is  lower  than  for  the  RBF  NNs  -f 
mMDL  (on  average  27.5%  from  NMSE). 
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From  the  comparison  above  we  can  conclude  that  using  the  mMDL  method  it  is 
possible  to  construct  the  RBF  NN  with  lower  number  of  hidden  units  but  at  the  expenses 
of  a  little  worse  quality  of  aproximation  as  with  the  OLS  method. 

6  Conclusions 

In  this  paper  we  have  compared  the  RBF  NNs  models  for  estimating  human  signal 
detection  performance.  The  results  achieved  using  the  mMDL  methods  were  better  as 
ones  using  the  OLS  method  with  respect  to  NC. 

Our  preliminary  simulations  show  that  for  achieving  better  results  in  terms  of  NMSE 
it  would  be  desirable  to  use  more  powerful  training  algorithm  (e.g.  Levenberg-Marquardt 
algorithm)  in  the  adaptation  procedure  of  the  mMDL  method  as  the  backpropagation 
(BP)  with  momentum  and  adaptable  learning  rate.  The  BP  algorithm  was  used  because 
of  the  computer  capacity  limitations. 

When  we  look  more  closely  at  the  NC  achieved  with  the  RBF  NNs  based  on  the  mMDL 
method  they  seems  to  be  comparable  for  all  operators.  This  fact  indicates  the  existance 
of  general  model  for  estimating  performance  of  human  operator.  It  would  then  remain 
only  to  adjust  this  model  for  individual  operator.  However  verification  of  this  hypothesis 
would  require  more  data  (not  only  from  5  operators). 
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Abstract  This  paper  is  dealing  with  an  analog  implementation  of  a  learnig 
synapse,  which  is  a  vital  part  of  a  neural  network.  The  proposed  synapse  includes 
a  postsynaptical  potential  forming  block  which  makes  it  possible  to  uniquely 
characterise  each  synapse  output  in  a  complete  neural  network.  This  approach  is 
conceptually  closer  to  its  biological  counterpart.  The  design  uses  switched 
capacitor  technique  in  order  to  minimise  the  design  area  to  make  the  above 
described  modification  realisable. 

1  Introduction 

In  many  neural  network  applications  an  implementation  of  a  learning  synapse  is  required. 
Such  synapses  provide  interconnections  of  all  neurons  in  the  network.  In  order  to  create  the 
matrix  of  synapses,  following  synapse  parameters  should  be  taken  in  account:  time  constant  of 
exponential  decrease  of  postsynaptical  potential  (PSP),  learning  rate  for  up  and  down  learning 
and  external  threshold  NMDA  potential.  It  is  important  for  each  synapse  to  have  a  specific 
output  signal  characteristic  (time  constant  of  PSP  discharge  is  assumed  in  our  case)  [1]. 
Therefore,  it  is  not  possible  to  use  a  circuit  with  soma  potential  capacitor  (SPC)  at  the  input  of 
the  neuron  and  charge  it  only  by  a  current  pulse  from  the  output  of  a  synapse,  as  it  was 
proposed  in  [2].  Rather  some  mechanism  for  controlling  of  the  synapse  output  characteristic  has 
to  be  introduced. 

In  this  paper,  a  design  of  SPC  based  circuit  for  realisation  of  learning  synapse  is 
presented.  In  order  to  achieve  an  efficient  implementation  of  the  synapse  from  the  area  point  of 
view  switched  capacitor  technique  was  employed  for  the  design.  Moreover,  in  SC  circuit  the 
unique  output  characteristic  of  each  synapse  can  be  easily  tuned  by  capacitance  ratio.  Finally, 
the  synapse  output  characteristic  (i.e.  the  time  constant  of  SPC  defined  in  a  number  of  clock 
cycles)  does  not  depend  on  the  clock  signal  frequency  that  offers  the  possibility  to  use  this 
solution  in  a  wide  frequency  range. 

2  SC  Synapse  Realisation 

The  goal  of  a  synapse  in  a  neural  network  is  to  provide  a  transformation  of  a  neuron 
output  signal.  The  possible  transformations  include  voltage  to  current  conversion,  pulse  shaping 
and  signal  weighting.  The  general  scheme  of  a  learning  synapse  is  depicted  in  Fig.  1.  The 
functionality  of  the  building  blocks  can  be  described  as  follows: 

•  A  circuit  for  shaping  the  postsynaptic  potential  (PSP),  which  causes  exponential  decay  of 
the  PSP  upon  the  arrival  of  signal  ACTIVITY  from  transmitting  neuron. 

•  A  voltage  adder,  producing  a  new  initial  voltage,  up  to  which  SPC  must  be  charged 

•  A  weight  adjustment  circuit,  causing  changes  of  the  weight  stored  on  the  memory  capacitor 

•  A  weight  refresh  circuit  dynamicaly  refreshing  the  weight  value  on  the  memory  capacitor 

•  A  V/I  converter  converting  the  PSP  to  output  current  from  the  synapse 
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Fig.  1  Block  diagram  of  learning  synapse 

Learning  is  based  on  Hebb’s  learning  rule  where  the  value  and  direction  of  weight  changes  is 
directly  proportional  to  time  between  the  transmitting  and  the  receiving  neuron  activity. 

2.1  Circuit  for  exponential  decay  of  PSP 

The  proposed  PSP  circuit  employs  two  capacitors  and  two  switches  controlled  by  main 
clock  signal  with  complementary  phases,  as  shown  in  Fig.  2.  Total  time  constant  of  the  circuit 
is  given  by  C1/C2  ratio  and  clock  signal  frequency  (where  Cl  and  C2  represents  SPC  and  an 
auxiliary  capacitor,  respectively).  The  whole  circuit  acts  in  fact  as  a  classical  RC  low-pass 
filter.  The  SPC  could  be  charged  by  a  current  source. 

However,  due  to  a  small  capacitance  value  of  SPC  very  low  currents  have  to  be  used  to 
charge  it  what  makes  the  design  difficult.  Alternatively,  a  voltage  source  can  be  used  to  charge 
SPC  as  well.  Nevertheless,  the  value  of  new  voltage  across  SPC  has  to  be  provided  by  summing 
of  weight  value  and  old  voltage  across  SPC.  This  sum  is  calculated  in  the  voltage  adder  block 
described  in  the  next  section. 

Voltage  adder 

This  circuit  is  controlled  directly  by  Activity  signal  from  the  transmitting  neuron.  Thus 
the  new  voltage  for  charging  SPC  is  aviable  only  during  the  active  state  of  this  signal. The 
scheme  of  the  voltage  adder  block  is  shown  in  Fig.  3. 


Fig.  2  PSP  circuit  as  SC  equivalent  of  RC  LPF 


Fig.  3: SC  Voltage  adder  block 
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Weight  adjustement  circuit 

In  the  simplest  case  the  designed  synapse  contains  a  dynamic  up  and  down  learning 
circuit  (down  learning  is  also  called  forgetting),  where  the  value  of  the  weight  is  adjusted  (see 
Fig.  4).  The  change  itself  depends  on  instantaneous  PSP  value  in  time  of  Activity  of  receiving 
neuron  signal  (Hd),  constant  a  given  during  by  ratio  of  weight  memory  capacitor  and  auxiliary 
capacitor  Cj  and  NMD  A  value  determining  the  learning  direction  over  the  time. 


Fig.  4:  Weight  adjustment  block 


Weight  refresh 

The  weight  memory  capacitor  is  self-discharging  by  surface  currents  caused  by  various 
side  effects  like  SiC>2  impunities  etc.  Due  to  that  the  value  stored  in  weight  memory  capacitor 
has  to  be  regulary  refreshed.  The  designed  circuit  was  derived  from  the  refresh  scheme 
proposed  in  [3]  only  a  simple  sawtooth  signal  can  be  used  in  our  case.  The  dynamic  range  of 
the  stored  weight  memory  value  is  divided  into  255  intervals  which  is  equivalent  to  the  8  bit 
weight.  The  whole  block  is  controlled  by  Reset,  Reset2  and  Ramp  signals.  The  Reset  and 
Reset2  are  derived  from  main  clock  signal  (with  Treset  =  Tclk,  Treset2  =  256*Treset).  The 
Ramp  signal  has  a  period  equal  to  the  Treset2  and  its  value  corresponding  to  the  dynamic  range 
of  the  stored  memory  value. 


Fig.  5:  Weight  refresh  block 


3  Experimental  Results 

The  circuit  was  designed  in  CADENCE  DF  II  ver  4.4.1  in  MIETEC  2.4pm  technology. 
Results  were  obtained  by  means  of  CADENCE  Analog  Artist  and  Hspice  ver.  8.2  simulators.  In 
Fig.6  the  simulations  of  PSP  block,  Weight  update  block  and  Voltage  adder  block  which  are 
essential  for  the  functionality  of  the  synapse.  The  Fig.  6(a)  shows  the  exponential  decline  of  the 
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postsynaptical  potential  with  respect  to  the  Activity  input  and  the  proper  functioning  of  the 
voltage  adder  block  afterwards.  The  process  of  weight  update  with  respect  to  the  transmitting 
and  receiving  neuron  activities  and  PSP  value  is  depicted  in  Fig.  6(b). 


Activity  of  neuron 


Fig.  6:  a)  Simulation  of  PSP  and  voltage 


Activity  of  transmitting  neuron 

6,0.  \  Activity  of  receiving  neuron 

Synapse  weight 

Lr 


adder  circuit,  b)  Simulation  of  learning  circuit 


4  Conclusion 

In  this  paper  an  analogue  design  of  the  complete  learning  synapse  except  V/I  conversion 
block  is  presented.  The  total  design  area  was  260x110  pm2.  The  synapse  was  realised  using 
switched  capacitor  technique  which  has  a  deep  impact  on  the  design.  For  example  the  realised 
SPC  using  the  above  mentioned  technique  has  a  value  of  800  fF  (compared  to  20pF  in  non  SC 
design)  what  makes  it  feasible  to  include  a  postsynaptic  potential  block  in  each  synapse. 
Therefore,  it  is  possible  to  define  unique  output  characteristic  (time  constant)  for  each  synapse 
which  is  closer  to  a  biological  neural  network. 
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Abstract  We  have  designed  a  32-b  RISC  microprocessor  with  16-/32-b 
fixed-point  DSP  functionality.  This  processor,  called  YD-RISC,  has  functional 
units  for  arithmetic  operation,  DSP  and  memory  accesses.  They  operate  in  parallel 
in  order  to  remove  stall  cycles  after  DSP  or  load/store  instructions  with  one  or 
more  latency  cycles.  High  performance  is  achieved  with  the  parallel  functional 
units  while  adopting  a  sophisticated  five-stage  pipeline  structure.  The  DSP  unit 
can  execute  one  32-b  multiply-accumulate  or  16-b  complex  multiply  instruction 
every  one  or  two  cycles  through  two  17-b  x  17-b  multipliers  and  operand 
examination  logic  circuits.  Power-saving  circuits  allow  low  power  consumption. 


I.  Introduction 

Embedded  microcontrollers  arc  used  to  execute  general-purpose  programs  [1],  [2]. 
DSPs  are  used  for  specialized  applications  such  as  image  and  voice  processing  [3].  Recently 
low-cost  embedded  processors  with  both  microcontroller  and  DSP  functionality  have  become 
necessary  in  the  advanced  consumer  electronics  applications.  Simply  combining  both 
functions  using  two  separate  cores  in  a  single  chip  is  not  cost-effective  and  not  appropriate  for 
embedded  systems  considering  the  low-cost  need  in  consumer  electronics  [4],  Therefore,  we 
designed  a  processor  that  combines  both  microcontroller  and  DSP  functions  into  one 
processor  without  doubling  resources,  while  providing  the  main  features  of  RISC  architectures 
for  flexibility  in  programming,  also  achieving  high  DSP  performance. 

Design  methodology,  which  is  mainly  classified  into  full-custom  design  and  logic 
synthesis  using  gate-level  cell  libraries,  is  a  very  important  point  for  successful  processor 
designs.  The  full-custom  method  is  suitable  for  high  performance,  small  area  and  low  power 
consumption,  but  it  is  time-consuming  and  labor-intensive.  The  method  through  synthesis  is 
time  and  labor-saving,  thus  making  short  time-to-market  possible  which  is  an  important 
requirement  for  successful  marketing.  Moreover,  synthesis  makes  it  easy  to  improve  designs, 
because  updating  designs  is  simply  re-synthesizing  using  improved  libraries  and  process 
technology.  We  designed  the  YD-RISC  through  logic  synthesis  and  automatic  place-and-route 
and  achieved  a  high  clock  frequency  by  refining  the  architecture  with  the  five-stage  pipeline. 

II.  Design  Points 

In  the  design  of  the  YD-RISC,  various  new  concepts  are  used  to  optimize  performance, 
area  and  power  consumption.  These  concepts  are  described  in  detail  below. 
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A.  Pipeline  and  Parallelism 

Instructions  are  divided  into  ALU,  DSP  and  load/store  instructions.  As  shown  in  Fig.  1, 
the  ALU  pipeline  consists  of  five  stages  in  accordance  with  the  basic  RISC  pipeline  structure, 
and  DSP  and  load/store  pipelines  have  separate  and  parallel  execution  stages  independent  of 
ALU  execution  and  write-back  stages.  This  enables  the  instructions  to  be  executed 
continuously  after  the  DSP  or  load/store  instructions  with  one  or  more  latency  cycles 
regardless  of  the  completion  of  these  instructions  as  shown  in  Fig.  2,  thus  improving 
performance  dramatically. 


Fig.  1.  Basic  pipeline  structure.  Fig.  2.  Parallelism  between  functional  units. 


B.  DSP  Unit 

The  DSP  unit  can  do  both  16-b  x  16-b  and  32-b  x  32-b  multiplications.  A  32-b  x  32-b 
multiplier  occupies  too  large  area  and  can  perform  only  one  16-bit  multiplication  at  one  time. 
In  contrast,  two  16-b  x  16-b  multipliers  occupy  about  half  the  area  of  one  32-b  x  32-b 
multiplier  and  can  perform  two  16-b  x  16-b  multiplications  simultaneously.  The  DSP  unit  of 
the  YD-RISC  can  execute  two  independent  16-b  x  16-b  multiplications  in  one  cycle  and  one 
32-b  x  32-b  multiplication  in  two  cycles  using  the  two  17-b  x  17-b  multipliers.  In  addition, 
the  number  of  cycles  in  32-b  x  32-b  multiplication  is  further  reduced  to  only  one  through  the 
operand  examination  scheme  as  shown  in  Tab.  I. 


Tab.  I.  Operand  examination 
scheme  for  1 -cycle 


Operation 

Operand  condition 
for  1 -cycle  32-bit 
multiplication 

32-bit 

signed 

multiply 

Operand[31:16]  is  equal 
to  0000(i6)  or  FFFF(i6> 
or 

Operand[15:0]  is  equal 
to  0000n6). 

32-bit 

unsigned 

multiply 

Operand[31:16]  is  equal 
to  0000(i6) 

or 

Operand[15:0]  is  equal 
to  0000(i6v 

Tab,  n.  Average  instruction  length 


Program 

No.  of  executed 
instructions 

Average 
instr.  length 

hello 

1096 

24.44 

hanoi4 

3021 

25.41 

bsortlO 

28160 

26.64 

ssortlO 

20455 

23.27 

q  sort  10 

15425 

24.35 

sivl024 

22044 

21.84 

dct22f 

173583 

21.20 

fft8f 

15389 

21.41 

dct22d 

2381 

22.28 

fft8d 

508965 

22.98 

fir 

40653 

24.74 

iir 

49362 

22.37 
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C.  Instruction  Prefetch 

This  architecture  has  no  instruction  or  data  cache,  and  thus  can  be  vulnerable  to 
execution  stall  due  to  long  memory  access  time  and  insufficient  instruction  availability.  In 
order  to  overcome  this  drawback,  the  processor  adopts  an  internal  4-Kbyte  SRAM  and  a 
prefetch  buffer,  which  is  implemented  as  a  circular  buffer  with  a  head  pointer  and  a  tail 
pointer.  The  loop  control  of  the  prefetch  buffer  reduces  the  bus  utilization  of  this  processor, 
and  thus  improves  performance  when  the  program  consists  of  small  loops  to  be  processed 
repeatedly.  Moreover,  this  architecture  adopts  a  variable-length  instruction  format  of  16  bits, 
32  bits,  or  48  bits,  which  reduces  the  average  instruction  length  below  25  bits  in  most 
programs  as  shown  in  Tab.  H  Improved  code  density  due  to  this  variable-length  instruction 
format  not  only  saves  memories,  but  also  improves  instruction  supply  and  performance.  Also, 
bus  arbitration  between  instruction  memory  access  and  data  memoiy  access  is  optimized  using 
the  results  through  various  simulations. 

D.  Power  Consumption 

Low  power  nature  is  essential  in  embedded  processors.  In  power-down  mode  provided 
in  our  processor,  only  the  clocks  for  the  logic  related  to  the  refresh,  interrupt,  timer  and  reset 
are  enabled,  while  all  other  clocks  are  disabled,  using  the  circuit  shown  in  Fig.  3.  Also,  power 
consumption  by  unnecessary  switching  in  execution  blocks  is  avoided  through  the  circuit  in 
Fig.  4,  which  permits  only  enabled  execution  blocks  to  receive  new  input  operands.  Power 
consumption  is  noticeably  reduced  by  these  power-saving  techniques. 


Clock  from  external  Permanent  clock 

Ito  the  timer,  interrupt, 
refresh  and  reset  logic 

Clock  disabled 
in  power-down 
mode 

to  all  other  parts 


Fig.  3.  Clock  disable  circuit. 


Fig.  4.  Execution  block  disable. 


in.  Implementation  and  Performance 

The  block  diagram  in  Fig.  5  shows  the  main  architecture  of  the  entire  microprocessor. 
The  logical  model  of  this  processor  is  described  by  using  an  industry-standard  HDL  and  is 
verified  with  a  HDL  software  simulator.  The  register  file,  prefetch  buffer  and  internal  SRAM 
are  designed  with  established  macro  cell  libraries,  while  the  others  are  logic-synthesized  with 
gate-level  cell  libraries.  The  layout  of  the  processor  is  shown  in  Fig.  6  and  measures  about 
8mm  x  8mm.  The  processor  designed  in  3.3V  0.6-um  triple  metal  technology  shows  the  clock 
frequency  of  50  MHz.  This  frequency  is  high  considering  that  the  frequency  of  LG  Semicon’s 
custom-designed  GMS30C32132  adopting  the  similar  architecture  and  the  same  technology  is 
30MHz  [5].  Moreover,  the  frequency  of  this  processor  can  be  almost  doubled  by  redesign 
using  current  available  0.35-um  technology.  The  YD-RISC  shows  slightly  better  performance 
than  GMS30C32132  in  integer  applications  and  twice  as  high  performance  as  GMS30C32132 
in  DSP  applications.  Table  III  shows  the  complex  FFT  performance  comparison  of  the  YD- 
RISC  with  different  DSP  processors,  assuming  that  the  performance  of  the  YD-RISC  is  one. 
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Fig.  5.  Architecture  of  the  YD-RISC. 


Fig.  6.  Layout  of  the  YD-RISC. 


Tab.  HI.  IK  16-bit  complex  FFT 
perforaiance  comparison 
of  the  YD-RISC  with 


different  processors 


DSP  processor 

Relative 

performance 

YD-RISC 

1.00 

LGS  GMS30C32132 

0.54 

TITMS320C50 

0.31 

ADSP2175 

0.68 

Motorola  56002 

0.78 

AT&T  DSP1617 

0.30 

NEC  UP77016 

0.74 

IV.  Conclusions 

In  this  microprocessor,  RISC  and  DSP  functionality  are  combined  cost-effectively, 
while  high  performance  is  achieved.  Excellent  performance  on  32-b  x  32-b  multiplication  and 
simultaneous  16-b  x  16-b  multiplications  is  achieved  through  two  17-b  x  17-b  multipliers  and 
operand  examination  logic  circuits.  Power-down  mode  and  disabling  execution  blocks  reduce 
the  power  consumption.  By  designing  with  logic  synthesis  and  established  libraries,  we 
reduced  design  time  as  well  as  achieved  a  high  clock  frequency  by  refining  the  processor 
architecture. 
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Abstract.  Coding  CIS-like  strategies  for  cochlear  implants  are 
discussed  with  respect  to  their  ability  to  extract  a  speech  signal  envelope. 
This  paper  describes  the  new  approach  of  the  new  HT  CIS  speech-coding 
strategy  to  obtain  envelope  of  the  speech  signal  by  using  Hilbert  transformer. 


I.  Introduction 


Cochlear  prostheses  are  used  for  patients  who  have  a  mechanical  defect  in  the  inner  ear. 
Although  their  nervous  system  may  be  completely  intact,  if  the  hair  cells  cannot  be  stimulated, 
acoustic  sensation  cannot  be  obtained.  In  such  cases  electrical  stimulation  can  be  used  to 
overcome  the  defect  of  mechanical  transmission. 

Nowadays,  the  most  commercially  used  speech-coding  strategies  for  cochlear  implant 
users  are  CIS  strategies  [2],  One  of  the  main  task  of  CIS  strategy  is  to  divide  an  input  signal 
into  a  few  frequency  bands  with  the  aim  to  obtain  the  signal  envelope  and  to  generate  non- 
simultaneous  biphasic  current  stimulation  pulses.  This  signal  envelope  is  usually  captured  by 
using  a  full  wave  rectifier  and  low  pass  filtering  or  by  using  a  half  wave  rectifier.  These 
solutions  bring  some  distortions.  However,  the  half  wave  rectifier  in  the  lowest  frequency 
bands  preserves  better  temporal  information.  Due  to  frequency  doubling  by  means  of  the  full 
wave  rectifier  the  temporal  information  may  be  lost.  The  recent  results,  e.g.  [1],  showed  that 
the  firing  possibility  of  nerve  fibres  with  high  characteristic  frequency  is  correlated  with  the 
envelope  of  the  amplitude  modulated  (AM)  stimulus.  Joris  and  Yin  also  showed  that  the  phase 
of  low  frequency  AM  signals  clearly  is  present  in  the  nerve  fiber  activity.  The  effect  is  referred 
to  as  'phase  locking'. 

These  problems  may  be  solved  by  using  Hilbert  transformer  in  the  new  HT  CIS  speech¬ 
coding  strategy  so  that  we  can  capture  an  envelope  with  a  minimal  distortion. 

II.  Design  of  Hilbert  transformer 

Let  us  imagine  a  complex  linear  filter  H (ejB)  with  the  frequency  response 


H{e*) 


0  <0  <n 
-tc  <0  <  0 


(1) 


to  obtain  an  output  signal,  for  which  spectrum  vanishes  for  negative  frequencies.  Provided  x[ri\ 
is  a  real  input  sequence,  the  complex  output  sequence  y[n]  of  such  a  filter  is 
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y[ri\  =  x[n ]  0  h[n]  =  x[n\  +  j(x[n]  0  ht  [«])  (2) 

where  h\ri\  is  the  imaginary  part  of  the  impulse  response  h[n\y  as  given  by 


{  2 


«7l 


10 


for  n  odd 
for  n  even 


(3) 


It  can  be  seen  from  (3)  that  the  real  part  of  the  complex  sequence  y[n]  is  just  the 
original  sequence  x[«]and  the  imaginary  part  of  y[n\  is  obtained  by  passing  x\n\  through  a 
linear  filter  with  the  impulse  response  stated  in  (3).  The  filter  of  equation  (3)  is  called  the  ideal 
discrete  time  Hilbert  transformer.  The  frequency  response  of  the  ideal  discrete-time  Hilbert 
transformer  is 


= 


o  <  0  <  % 

-71  <0  <  0 


(4) 


As  in  the  continuous  time  case,  the  discrete  time  Hilbert  transformer  can  be  regarded  as  an 
allpass  filter  providing  a  phase  shift  of  %  /  2  radians  at  all  frequencies. 

Assuming  that  an  AM-input  signal  x\n\  =  2e[w]cos0o« ,  where  0O  is  a  carrier 
frequency  of  the  input  signal  and  2 e[n]  is  the  input-signal  envelope.  If  the  frequency  response 
of  the  input  signal  can  be  written  as  X(eJ°)  =  E(eji°'do))  +  E(eJ(0+do)),  the  analytic  signal 
y[n]  can  be  written  as 

y[n]  =  —  f efi"cB  =  2e[//]eA"  =  2e[«]cos0o«  +y'2e[«]sin60«  (5) 

K  0 

From  (5)  we  can  see  that  if  the  input  signal  x[w]  is  real  valued,  the  output  sequence  y[ri\  has 
an  equal  envelope  when  compared  to  the  input  sequence  x[n] . 


III.  FIR  Linear-Phase  Design 


Specifically,  suppose  we  shift  the  frequency  response  (eJ° )  of  equation  (4)  by  n/2 
radians  to  obtain  a  new  filter  G(efi)  with  its  frequency  response 


G(eJ'°)=  Hi(eK0+n,2))  = 


|e|  <  tc/2 
7c/2  <  |0|  <  K 


(6) 


Note  that  this  is  simply  the  ideal  half-band  lowpass  filter  with  passband  magnitude  scaled  to  the 
value  1.  A  complex  half-band  filter  H{ejQ)  can  be  realised  by  using  a  Hilbert  transform  as  the 
imaginary  part  of  a  complex  filter.  Supposing  that  the  order  of  the  filter  G( eJ  )is  2 N ,  where 
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N  is  odd.  Furthermore,  for  impulse  response  g[2N  -n]  =  g[n ]  and  every  odd  sample  of  g[ri\ 
is  set  to  zero.  Consequently,  by  “decimation"  of  the  filter  G(eJQ)  we  may  get  the  filter  E(ej2e) 

E(eJ2Q)  =  G(e*).  (7) 

The  frequency  response  of  the  complex  filter  H(eJQ)  may  be  written  as  follows 

H{e*)  =  <T^(1  +JH,(a))  =  e~’m{\+ jG(u  -|))  =  «-**(!  +  /£( 2(<o  - 1)))  (8) 

where  //t(co),  G(co)  and  £(co)are  zero-phase  responses  of  the  mentioned  filters.  The  term 
E( 2(cd  -  n/2))  describes  a  linear-phase  filter  with  a  group  delay  of  N  samples.  The  linear- 
phase  FIR  filter  with  an  N-  degree  zero-phase  response  £(co)  and  N  odd  and  with  a 
symmetrical  impulse  response,  can  be  designed,  e.g.,  by  using  the  program  of  McClellan  et  al. 

By  using  the  starter-kit  with  a  fixed-point  16-bit  ADSP  2181  signal  processor  the 
envelope  detection  has  been  implemented  to  the  CIS  speech-coding  strategy.  All  the  software 
has  been  programmed  in  the  assembler  for  ADSP-2100  family. 

The  sampling  frequency  is  set  to  16  kHz.  The  utilized  stereo  codec  AD  1 847  has  linear- 
coded  16  bit  A/D  converter  and  on-chip  antialiasing  filter.  The  dynamic  range  of  the  A/D 
converter  is  84  dB  so  it  is  not  necessary  to  compress  an  input  signal.  After  the  signal  is 
sampled  the  preemphasic  filter  is  used.  It  is  an  FIR  filter  of  the  6th  order  designed  by 
rectangular-window  method.  Digital  Gain  Control  (DGC)  is  used  as  a  limiter  to  overcome  the 
distortion  or  loud  perception  caused  by  very  loud  signals.  When  the  input  signal  reaches  very 
high  value  (usually  the  highest  decibel  level  of  used  dynamic  range),  the  output  signal  will  be 
decreased  by  rate  of  programmable  time  constant  and  afterwards  when  the  input  decreases 
below  a  certain  value  (usually  -3  dB  from  the  maximum  reachable  value  of  the  input),  the 
output  signal  will  be  slowly  increased.  The  attack  and  release  time  constants  may  be  changed 
programmable.  The  signal  is  split  into  the  8  band-pass  filters  in  the  frequency  range  from  300 
to  5500  Hz  logarithmically.  It  is  very  important  how  the  filter  bank  is  designed.  The  frequency 
of  the  input  signal  can  be  assigned  by  means  of  a  proper  steepness  of  the  filters.  Theoretically, 
the  steepness  of  the  filters  should  not  be  higher  than  the  steepness  acquired  from  measurements 
of  the  masking  effect  on  normally  hearing  subjects,  e.g.  [3].  Using  the  sensitivity  control  circuit 
we  may  adjust  a  gain  for  each  channel  separately.  The  following  stage  -  rectifier  is  different  for 
each  designed  strategy. 


Fig.  1  A  block  diagram  of  the  envelope  detection  for  the  programmed  HT  CIS 
speech-coding  strategy 
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Figure  1  depicts  a  block  diagram  of  the  envelope  detection  for  the  proposed  HT  CIS 
speech-coding  strategy.  The  output  of  the  band-pass  filter  is  split  into  two  lines.  In  the  first  line 
the  signal  is  delayed  by  N/2  samples  where  N  -9%  is  the  order  of  the  designed  Hilbert 
transformer.  In  the  second  one  the  signal  is  filtered  by  the  Hilbert  transformer.  The  Hilbert 
transformer  was  designed  for  the  frequency  range  from  20  to  6500  Hz.  Furthermore,  the 
values  in  both  lines  are  squared,  summed  and  evolved.  After  the  exact  envelope  is  detected  the 
signal  is  low-pass  filtered  with  the  cut-off  frequency  of  370  Hz.  Especially,  because  of  the  8 
kHz  sampling  frequency  for  the  first  four  channels  in  the  HT  CIS  strategy  a  different  low-pass 
filter  had  to  be  designed  for  these  channels.  The  low-pass  filters  are  FIR  filters  of  61  and  100 
coefficients,  respectively.  The  low-pass  filter  is  at  the  output  of  the  envelope  detection  not  due 
to  the  distortion  as  it  was  when  using  a  FWR,  but  only  for  the  necessity  the  exact  perception  of 
modulation  signal  to  have  at  least  four  stimuli  samples  per  one  period  of  modulation  signal. 
Moreover,  a  loudness  transformation  compresses  the  signal  into  the  range  of  current 
stimulation.  Finally,  after  the  loudness  transformation  is  performed,  a  volume  control  and 
subjective  adjustment  is  made  use  of  Volume  control  may  change  the  volume  in  percentile 
range  from  0  to  100%  in  two  modes,  RTI  and  IBK.  The  subjective  adjustment  is  set  for  each 
channel  separately  and  transforms  the  output  signal  into  the  dynamic  range  given  by  the 
electrically  evoked  hearing  threshold  T  and  the  most  comfortable  level  C . 

IV.  Discussion 

Using  Hilbert  transformer  to  extract  an  envelope  signal,  the  input  pure  tone  is 
transformed  into  the  direct  signal  with  the  amplitude  proportional  to  the  amplitude  of  the  input 
signal.  The  input  AM  signal  is  transformed  into  the  modulation  signal  so  that  at  the  output 
there  are  only  the  information  or  most  of  them  that  are  important  for  stimulation  of  the 
auditory  nerve  fibers  [1]. 

However,  using  the  AM  signal  with  carrier  frequency  lower  than  modulation  frequency 
multiplied  by  1.8,  the  output  signal  is  different  from  those  that  are  achieved  with  the  half-wave 
rectifier  especially  for  the  two  lowest  channels.  The  differences  among  voiced  phones  are 
higher  than  by  using  the  half-wave  rectifier  or  “conventional44  full-wave  rectifier  with  400Hz- 
smoothing  low-pass  filter  so  that  the  wider  dynamic  range  of  audible  frequencies  for  an  active 
channel  is  used.  For  this  reason  the  voiced  phones  may  be  better  understood  by  using  the  HT 
CIS  strategy  than  the  PL  CIS  and  CIS  strategy. 

At  the  department  ORL,  University  Faculty  Hospital  in  Bratislava  the  new  speech¬ 
coding  method  for  cochlear  implant  subjects  is  under  investigation  and  has  been  successfully 
used  on  three  postlingually  deaf  subjects  with  cochlear  implant  Combi  40.  More  information 
may  be  seen  in  [2]. 

This  work  was  supported  by  the  Ministry  of  Education  of  the  Slovak  Republic  under 
Grant  VP  1/9097/99. 
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Abstract.  In  this  paper,  we  developed  a  signal  processing  algorithm  of 
digital  filter  to  improve  the  signal  to  noise  ratio  in  magnetic  anomaly  detection 
system.  Using  this  filter,  we  can  remove  the  coherent  noises  in  the  time  domain  and 
improve  the  signal  to  noise  ratio  of  the  magnetic  anomaly  detecting  system.'  We 
show  the  ability  of  the  geological  magnetic  filter  under  various  circumstances 
through  computer  simulations.  Numerical  simulation  results  show  that  proposed 
digital  filter  can  excellently  remove  the  sensor  misalignment  effects  and  the  regular 
short  range  local  noise  as  well  as  it  delete  the  coherent  noises. 


I.  Introduction 

Magnetic  anomaly  detection  system  detects  the  change  of  the  short  range  magnetic  fields 
generated  by  magnetic  materials  and  identifies  the  existence  of  magnetic  anomalies.  The 
detecting  ability  of  magnetic  anomaly  detection  system  is  mainly  determined  by  the  sensitivity 
of  the  magnetic  sensor.  Recently,  the  development  of  the  high  TC  superconductor  enables  the 
implementation  of  very  sensitive  magnetic  detection  system  at  the  fair  expense.  With  the 
increase  of  the  sensitivity  of  the  magnetic  detection  system,  the  DSP  algorithms  to  increase  the 
signal  to  noise  ratio  become  more  important  .  Fowler[l]  suggested  that  the  change  of 
measured  magnetic  fields  at  different  locations  can  be  modeled  as  a  filter  and  concluded  that 
the  coherent  magnetic  filed  noises  are  dominant  over  the  incoherent  terms  based  on  the 
experimental  results. 

In  this  paper,  we  developed  digital  signal  processing  algorithms  of  the  digital  filter  to 
improve  the  signal  to  noise  ratio  of  the  magnetic  detection  system  using  the  spatial  coherency 
of  the  magnetic  noises.  In  the  proposed  algorithm,  two  sensors,  detector  sensor  and  reference 
sensor  located  at  a  distance,  measure  the  magnetic  fields  concurrently  at  the  absence  of  the 
magnetic  sources.  After  that,  the  correlations  between  the  two  fields  are  calculated  in  the 
frequency  domain  and  the  digital  filter  coefficients  are  calculated.  The  coefficients  reflect  the 
environments  where  the  sensors  are  located.  From  the  coefficients  in  the  frequency  domain,  the 
FIR  filter  coefficients  are  calculated.  Using  this  digital  filter,  we  can  remove  the  coherent 
magnetic  noises 
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II.  Geological  Magnetic  Filter  Theory. 

Magnetic  fields  are  generated  by  the  magnetic  moments  which  are  originally  due  to 
the  movements  of  the  charged  particles.  The  intensity  of  the  magnetic  fields  drops  rapidly  as 
the  distance  from  the  source  increase.  We  can  separate  the  measured  magnetic  fields  into  two 
components,  long  range  field  and  the  short  range  field  according  to  the  distance  from  the 
magnetic  source.  If  the  distance  between  the  detector  and  reference  sensor  is  very  short 
compared  with  the  distance  from  the  source,  the  difference  of  the  measured  magnetic  fields 
between  the  two  sensors  is  very  small.  But  if  the  distance  from  the  source  is  comparable  to  the 
sensor  separations,  the  difference  of  the  measured  field  is  very  large.  The  change  of  the 
measured  long  range  magnetic  fields  between  the  detector  and  reference  sensor  is  influenced 
by  the  environments  where  the  sensors  are  located.  The  influence  of  the  magnetic 
environments  on  the  micropulsation  fields  can  be  modeled  as  follows.  We  can  model  the 
magnetic  environments  where  the  reference  sensor  is  located  as  a  linear  filter[2].  The  input  of 
the  filter  is  the  micropulsation  field  with  the  orthogonal  independent  components,  Bsx  and 
Bsy.  The  characteristics  of  the  filter  is  determined  by  the  environments.  The  output  of  the 
filter  is  the  reflected  wave  (Brx>,Bry,  Brz).  And  the  measured  micropulsation  fields  can  be 
expressed  as  a  sum  of  the  input  and  reflected  wave. 

BRtXifbB;{/)+Brxif) 

BR,y{fhB;(f)+Bry(f) 

The  measured  z  direction  field  is  the  reflected  wave. 

BR,(f)=B:(f)  (2) 

From  (1)  and  (2)  the  relation  between  the  micropulsation  input  field  and  the  measured 
magnetic  field  can  be  expressed  as, 
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, where  A[j(f)  is  the  filter  coefficient.  (3)  can  be  simplified  as  follows. 
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The  environments  where  the  detector  sensor  is  located  are  different  from  that  around  the 
reference  sensor  and  expressed  as  another  filter  with  different  coefficients  and  the  measured 
magnetic  field  at  that  point  can  be  expressed  as  follows. 
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The  difference  of  input  micropulsation  fields  at  the  two  places  can  be  expressed  as, 

bim)= *;(/)+<,(/)  (6) 


In  (6),  Bn:X(D  and  B^yff)  are  noise  terms  due  to  the  spatial  incoherency  or  gradient  of 
the  micropulsation  field.  Let’s  define  the  filter  transformation  operator  C  which  reflects  the 
difference  of  the  environments  around  the  two  sensors. 
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Then,  we  can  obtain  equation  (8). 
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In  (8),  the  right  terms  are  due  to  the  incoherency  of  the  source  waves.  Practically,  it  can 
be  generated  due  to  the  short  range  magnetic  fields  which  strongly  affect  only  one  sensor.  In 
realization  of  the  digital  magnetic  filter,  we  obtain  the  filter  coefficients  Qj  which  minimize 
the  magnitude  of  right  term  in  (8).  When  the  filter  coefficients  Cy  are  determined  in  the 
frequency  domain,  we  can  obtain  the  FIR  filter  coefficients  in  the  time  domain  and  the  real 
time  digital  filter  outputs  can  be  calculated  in  the  time  domain. 


III.  Computer  Simulation  Results 

To  test  the  performance  of  the  proposed  signal  processing  algorithm,  we  simulated  the 
digital  filter  outputs  under  various  conditions.  We  can  input  the  magnitude  of  the  earth 
magnetic  field,  misalignments  of  two  sensors,  coherent  and  incoherent  components  of  the 
noise,  and  the  parameters  for  the  construction  of  the  digital  filter  such  as  the  number  of 
samples  per  data  acquisition  window,  the  number  of  data  acquisition  windows,  and  the 
number  of  FIR  filter  tabs. 

We  can  see  from  Fig.  1  that  proposed  the  geological  magnetic  filter  compensate  mean 
value  of  the  misalignments  of  sensor  orientation. 


(a)  Bd,xO) 


(b  )BD>x(t)-BR)X(t) 


254 


OMtC  01 
CnTjf 


(c)  Geological  magnetic  filter  output 

Fig.l.  Misalignment  compensation  result  using  the  geological  magnetic  filter 

We  performed  simulations  when  the  magnetic  source  appears.  It  is  assumed  that  the 
detector,  reference  sensors  and  the  source  are  aligned  along  the  *  axis.  The  distance  between 
detect  and  reference  sensors  is  10  m.  At  t=0 ,  magnetic  sensor  is  at  x  =  10  m  and  moves  at  the 
constant  speed  of  10  m/s.  Fig.  2  shows  the  results  for  Bj)tX(t)  ,  Bj)>x(tf  BR>x(t)  and  the 
geological  magnetic  filter  output.  The  calculated  variance  of  BDx(t)-BKft)  is  50  nP>  nearly 
three  times  less  than  that  of  the  geological  magnetic  filter  output.  This  is  the  expected  result 
because  we  assumed  no  frequency  dependence  of  the  environments  around  the  detector  and 
reference  sensors. 
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(c)  Geological  magnetic  filter  output 
Fig.  2.  Simulation  results  when  magnetic  source  appears 


IV.  Conclusion 

A  signal  processing  algorithm  of  the  digital  filter  that  can  improve  the  signal  to  noise 
ratio  of  the  magnetic  detection  system  has  been  presented  and  the  numerical  simulation  results 
have  been  shown  in  this  paper.  Digital  filter  is  constructed  using  two  3-axis  magnetic  sensors, 
detector  sensor  and  reference  sensor.  The  main  object  of  digital  filter  proposed  is  to  cancel  out 
the  coherent  magnetic  noises  using  two  sensors.  From  the  simulation  results  we  could  also 
observe  that  proposed  digital  filter  can  effectively  remove  the  sensor  misalignment  effects  and 
the  localized  noises  due  to  short  range  magnetic  sources. 
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Abstract3.  In  this  paper,  we  present  a  modified  4-IF  all-digital  downconver- 
sion/decimation  technique.  The  original  4-IF  method  provides  an  alternate 
way  to  perform  downconversion  from  an  intermediate  frequency  (IF)  placed 
at  fs/A  into  the  baseband  by  purely  digital  means.  It  results  in  a  very  simple 
and  power  efficient  architecture.  We  show  that  some  flexibility  in  IF  set¬ 
tings  can  be  achieved  by  a  scheme  very  similar  to  the  4-IF  technique.  In  this 
modified  4-IF  scheme,  the  IF  is  not  fixed  at  /a/4  but  rather  can  be  selected 
from  a  limited  set  of  possible  frequencies.  We  also  show,  that  this  modifi¬ 
cation  causes  only  a  small  increase  of  the  design  area  and  estimated  power 
consumption. 

L  Introduction 

In  current  telecommunication  applications  the  straight  line  between  the  analog  and  digital  cir- 
cuiteiy  (conventionally  implementing  IF  and  baseband  parts)  does  not  exist  anymore.  Rather 
some  combinations  of  analog  and  digital  implementation  with  emphasis  on  a  low  power  con¬ 
sumption  are  investigated  and  used.  The  key  is  a  trade-off  between  programmability  (or  flexi¬ 
bility)  and  performance.  In  this  paper  we  will  discuss  a  digital  scheme  for  downconversion  of 
a  QAM  signal  from  EF  to  the  baseband.  A  conventional  downconversion  approach  with  mixer 
and  sine  wave  generator  is  difficult  to  realize  by  digital  means  outside  baseband  or  very  low 
IF  since  it  uses  a  mixer  (multiplier)  working  on  the  highest  frequency.  On  the  other  hand,  a 
digital  implementation  usually  requires  the  use  of  decimation  in  order  to  obtain  a  power  effi¬ 
cient  solution.  If  this  is  the  case,  an  alternate  approach  utilizing  properties  of  decimation  can  be 
exploited  to  accomplish  the  downconversion  simultaneously  at  practically  no  costs.  The  basic 
idea  is  indicated  in  Fig.  1.  Both  schemes  in  Fig.  1  are  functionally  equivalent,  only  a  frequency 


Figure  1:  Decimation  and  downconversion 

shift  of  the  signal  (i.e.  downconversion)  in  the  first  case  is  replaced  by  a  frequency  shift  of  the 
anti-alias  filter  (i.e.  using  HP  filter  instead  of  LP  prototype).  This  transformation  is  well  known 
from  the  filter  bank  theory.  The  whole  system  as  shown  in  Fig.  1  can  be  considered  to  be  a  part 
of  a  filter  bank  tuned  to  a  selected  frequency  band.  However,  compared  to  the  ordinary  filter 
banks,  the  goal  in  this  case  is  not  to  obtain  a  perfect  reconstruction,  but  rather  to  downconvert 
one  selected  band  into  the  baseband  with  specified  requirements.  This  often  results  in  a  sim¬ 
pler  design  process  and  cheaper  implementation.  The  obvious  disadvantage  of  this  approach 
lays  in  the  fact  that  the  actual  frequency  to  downconvert  is  transformed  into  the  modified  filter 
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coefficients  which  makes  an  eventual  change  of  IF  more  difficult.  In  this  paper,  we  propose  a 
modified  4-IF  downconversion  scheme  which  grants  a  limited  freedom  in  IF  settings  at  cost  ol 
a  small  hardware  overhead.  The  rest  of  the  paper  is  divided  as  follows:  in  the  next  section,  the 
4-IF  sampling  scheme  is  described  more  in  detail.  In  section  3,  the  downconversion  from  IF 
different  from  f3/ 4  is  discussed  followed  by  the  description  of  the  modified  4-ffi  scheme.  The 
experimental  results  are  given  in  in  section  5  and  conclusions  are  drawn  in  section  6. 

II.  4-IF  Digital  Downconversion 

The  4-IF  downsampling  technique  is  used  for  downconversion  of  a  signal  centered  around  fs/ 4 
into  the  baseband  with  simultaneous  decimation  by  factor  R  =  4.  It  takes  advantage  of  the  fact, 
that  the  mixer  values  for  this  IF  position  can  only  be  1,-1  and  0  (i.e.  the  values  of  sin[27rn/4) 
and  cos(27m/4)).  Mixing  of  the  input  signal  with  these  numbers  combined  with  polyphase 
decomposition  of  the  anti-alias  filter  during  downconversion  by  4  results  in  a  very  efficient 
hardware  implementation  since  one  filter  is  necessary  for  both  I  and  Q  components  as  shown  in 
Fig.  2.  This  idea  was  first  introduced  in  [1]  in  a  QAM  modulator/demodulator  design  working 


Figure  2:  4-IF  sampling  method 

at  high  frequency  (fs  =  200 MHz).  It  was  further  extended  in  [2],  where  various  multistage 
configurations  derived  from  the  basic  4-IF  scheme  were  tested  and  compared  with  respect  to 
their  area/power  figures.  For  a  design  of  multiple  filter  stages  the  Interpolated  FIR  Method  was 
used.  As  an  example  a  CDMA  cellular  proposed  by  Qualcomm  (originally  an  analog  circuit) 
was  analyzed.  The  final  4-IF  downconversion  scheme  from  [2]  is  shown  in  Fig.  3 


Figure  3:  4-IF  downconversion  system  proposed  in  [2] 


IIL  Downconversion  from  arbitrary  IF 

A  possibility  to  use  a  frequency  shifted  anti-alias  filter  coupled  with  a  decimator  as  downconver- 
tor  for  IF  different  from  fa/ 4  was  investigated  in  [3].  This  approach  was  proven  to  be  directly 
applicable  for  IF  satisfying  Eq.  1,  where  R  is  the  decimation  ratio  and  AT  is  a  whole  number. 

IF  =  faN/R  (1) 

In  that  case  there  is  no  need  for  an  additional  mixer  at  any  stage  during  the  downconversion, 
since  for  these  frequencies  the  IF  is  translated  directly  into  the  baseband.  There  was  further 
shown  in  [3],  that  the  frequency  shifted  anti-alias  filter  coefficients  can  be  calculated  from  their 
LP  prototypes  by  a  complex  rotation  (see  Fig.  4). 
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Vo  If  fs/27 


Figure  4:  Calculation  of  frequency  shifted  filter  coefficients 


IV.  Modified  4-IF  Downconversion  Scheme 

We  start  from  the  original  4-IF  scheme  in  Fig.  3.  The  set  of  possible  EF’s  fulfiling  Eq.  1  in 
this  case  is  IF  €  {0,  /5/8,  fsj  4, 3/5/8}.  This  implies  the  integration  of  four  different  anti-alias 
filters  (i.e.  one  for  each  IF).  Since  these  filters  do  not  have  to  work  simultaneously  (only  one 
band  is  downconverted  at  any  time)  a  single  filter  with  4  selectable  sets  of  coefficients  can  be 
used.  The  issue  of  multiple  shape  FIR  filter  design  was  discussed  in  [4].  The  obvious  problem 
is  an  integration  of  the  anti-alias  filters  for  IF  =  f8j 8  and  IF  =  3/s/8.  The  downconversion 
from  f3/ 8  and  3/s/8  for  the  I  branch  of  the  downconvertor  with  indicated  distribution  of  mixer 
multiplicants  through  the  polyphase  structure  is  shown  in  Fig.  5.  This  indicates  the  necessaiy 


(a)  fs/8  downconversion 


-A,  A, -A 

Pill 


h  (4k) 


h(4k+l) 


A, -A, A, 


h (4k+3) 


(b)  3fs/8  downconversion 


Figure  5:  Downconversion  from  fs/ 8  and  3/a/8 


modifications  of  the  filters  for  f3/ 8  and  3/a/8  cases.  From  comparison  of  Fig.  2  and  Fig.  5  we 
can  see  that  the  filters  h(4k)  and  h(4k  +  2)  can  be  implemented  as  shown  in  Fig.  6(a)  since  the 
multiplicants  differ  only  in  signs.  The  branches  h(4k  +  1)  and  h(Ak  +  3)  must  integrate  two 


Figure  6:  Implementation  of  the  filters 

different  shapes:  the  ’’original”  4-IF  shape  and  the  scaled  one  (with  scaling  factor  A  =  \/2/2). 
We  have  found  that  the  scaling  of  the  filter  coefficient  results  in  a  cheaper  implementation 
compared  to  the  additional  scaling  of  the  filter  outputs.  The  multishape  filters  are  implemented 
as  in  Fig.  6(b)  (where  realization  of  a  single  tap  is  shown).  The  filter  coefficients  were  encoded 
in  CSD  format  and  the  different  coefficients  were  implemented  by  programmable  shifts  in  the 
CSD  add-shift  structure  as  in  Fig.  6(c)  (for  more  details  see  [4]).  In  the  decoder  unit,  the  four 
filter  outputs  are  added  according  to  Eq.  2  to  produce  the  correct  I  and  Q  components  for  each 
IF. 

Id  =  Qd  =  2/o  +  2/i  +  2/2  +  2/3  for  IF  =  0 

irf  =  2/o  +  2/i+2/3;Od  =  yi+2/2-2/3  for  IF  =  f3/S 

Id  =  Uo  ~  V2]  Qd  =  Vs-  2/1  for  IF  =  fs/4 

^d  =  ^o-2/i-2/3;Qd  =  yi-2/2  +  2/3  for  IF  =  3/5/8 


(2) 
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Finally,  since  the  signals  at  fs/S  and  3/a/8  are  centered  around  fs/ 2  after  the  decimation  by 
4  the  signal  must  be  frequency  shifted  by  it  before  the  decimation  by  2  can  take  place.  The 
shift  by  7r  can  be  realized  very  efficiently  in  hardware  since  it  requires  only  multiplications  by 
1  and  —  l(see  Fig.  6(d)).  The  complete  scheme  of  the  downconversion  unit  with  Selectable  It  is 

shown  in  Fig.  7. 


-N\U:i(2k+i)^H 

£2  Single  shape  filter 
H  Multi  shape  filter 


0  Fs/8  Fs/4  3Fs/8  Fs/2 
^-4. '»  J||l  Decimation  by  4 

Ipti'  'Jill  tyultishape  filter+rotation 
0  Fs/2  Fs 

||y  Decimation  by  2 

Single  shape  filter 


Figure  7:  Modified  4-IF  downconversion  system  with  selectable  frequency 

V.  Experimental  Results 

We  have  synthesized  the  original  4-IF  system  proposed  in  [2]  and  our  modified  circuit  by  means 
of Synopsys  Synthesis  Tools  into  0.7/r  CMOS  Alcatel-Mietec  library.  The  power  estimates  have 
been  performed  by  Synopsys  Design  Power  Tools.  The  results  are  shown  in  Table  1.  The  area 


Table  1:  Synthesis  Results 


ggaBBIMSiiiiB 


is  given  in  equivalent  number  of  IVA  gates.  The  stages  were  synthesized  with  clock  frequencies 
20, 5  and  1.25  MHz,  but  the  critical  path  of  the  4-IF  part  was  only  13.54  ns  which  indicates  the 
possibility  to  use  it  on  much  higher  frequencies. 

VI.  Conclusions 

The  presented  technique  shows  the  trade-off  between  flexibility  and  area/power  consui^tion 
parameters  in  the  digital  downconvertor.  In  our  example,  the  limited  programmability  of  IF  set¬ 
tings  results  in  an  increase  of  the  design  area  of  the  4-IF  block  by  factor  1.3  and  estimated  power 
consumption  by  1.44  (respectively  1.12  and  1.2  if  the  whole  circuit  is  considered)  combed 
to  the  original  4-IF  technique.  It  demonstrates  the  possibility  to  process  an  IF  signal  by  digital 
means  effectively  also  at  higher  frequencies  which  can  simplify  the  design  of  analog  circui  ery. 

References 

[11  B.C.  Wong  and  H.  Samueli,  "A  200  MHz  all  digital  modulator  and  demodulator  mL2-um 
CMOS  for  digital  radio  application",  IEEE  J.  SSC,  vol.  26,  pp.  1970-1980.  Dec.  1991. 

[2]  S.  Jou,  S.  Wu,  and  CH.  Wang,  "Low-Power  Multirate  Architecture 

Down  Converter”,  IEEE  Trans.  Circuits  and  Systems  II,  vol.  45,  pp.  1487-1494.  Nov.  1998. 

[3]  P.  Schaumont,  S.  Vernalde,  M.  Engels,  and  I.  Bolsens,  ’’Low  Power  Digital  Frequency 

Conversion  Architectures” ,  Kluwer  J.  VLSI  Signal  Proc.,  vol.  18,  pp.  187-19  ,  9  . 

T41  R.  Pasko,  P.  Schaumont,  and  S.  Vernalde,  D.  Durackova,  "Efficient  Implementation  of  Mul¬ 
tiple  Shape  FIR  filters",  Proc.  of  EDS  conf.,  pp.  197-200,  Brno  Jul.  1998. 


2nd  Electronic  Circuits  and  Systems  Conference 
September  6-8,  1999,  Bratislava,  Slovakia 


259 


Feasibility  of  a  Fully  Digital  Radio-Frequency  Stage  for  a  DA  VIC 
Compliant  Modem  Application. 

J.Ph.  Lambert,  A.  Dandache,  F.  Monteiro,  B.  Lepley. 

LICM  /  CLOES  /  SUPELEC,  University  of  Metz 
2  rue  E.  Belin,  57078  METZ  Cedex  03,  FRANCE 
Tel:  + 33.(0)3.87 . 74.61.00,  Fax:  +33.(0)3.87.20.33.87, 

Email:  lambert@ese-metzfr 

Abstract:  In  this  paper,  we  study  the  feasibility  of  a  fully  digital  radio- 
frequency  stage  dedicated  to  multimedia  applications  compliant  to  the  DA  VIC 
recommendation.  We  particularly  focused  on  the  1.544  Mbits/s  rate  using  the 
DQPSK  modulation.  After  choosing  an  adequate  block  diagram  of  the 
architecture,  these  blocks  have  been  described  using  VHDL  for  synthesis.  The 
functional  simulation  of  the  complete  system  was  performed  on  the  Ptolemy 
software. 

I.  Introduction. 

Designing  modem  application  compliant  to  the  DAVIC  [1]  recommendation  using 
analogue  techniques  is  a  quite  complex  operation  [2].  Recent  realisations  |3-5]  clearly  show  that 
the  current  tendency  is  to  design  systems  using  digital  techniques,  especially  in  the 
telecommunication  area,  in  order  to  remove  most  of  the  analogue  design  problems.  The  digital 
architecture  of  our  design  has  been  defined  from  the  specification  of  the  DAVIC 
recommendation.  Each  block  of  the  radio -frequency  stage  has  been  described  in  VHDL, 
synthesised  and  simulated  on  the  MaxPlusII  software  from  ALTERA.  The  functional 
verification  of  the  full  system  has  been  performed  using  the  Ptolemy  software  from  Berkeley 
University. 


Fig.  1:  DAVIC  recommendation  radio-frequency  transmission  line. 
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n.  DA  VIC  Recommendation  constraints. 

The  figure  1  shows  the  global  architecture  of  the  digital  radio-frequency  stage  for  a 
modem  compliant  to  DAVIC  recommendation  [1].  This  recommendation  defines  two 
transmission  bands:  the  upstream  band  from  10  MHz  up  to  30  MHz,  and  the  downstream  band 
from  50  MHz  up  to  87  MHz.  In  each  one,  the  frequency  agility  must  be  ensured.  Furthermore, 
the  characteristics  of  the  intermediate  frequency  filter  are  fixed  (10  order,  772  kHz 
bandwidth,  and  a  rejection  level  of  30  dB  between  the  modulated  signal  and  the  demodulation 
carrier).  The  use  of  in-phase  and  quadrature  modulation  (demodulation)  is  recommended. 


Fig.  2:  Overview  of  the  signals  performed  by  the  digital  radio-frequency  stage. 


m.  Architecture  and  VHDL  modelisation. 

The  modulation,  demodulation,  mixer,  and  filter  blocks  make  large  use  of  multiplication, 
summation  and  multiplexing.  As  most  of  these  functions  are  time-consuming,  the  overall 
structure  has  to  be  changed  in  order  to  accelerate  the  clock  rate.  The  pipeline  approach  allows, 
thanks  to  the  internal  latches,  a  substantial  increase  of  the  data  flow  throughput.  Simulations 
have  been  performed  for  several  architectures  including  an  array  multiplier  [6],  a  ripple-carry 
adder  [7],  a  Wallace-tree  adder  [7],  and  a  decimation-FIR  filter  [5].  The  maximal  clock 
frequency  is  the  same  for  all  of  them  and  equates  116.27  MHz  on  the  FPGA  FLEX10K  family 
of  ALTERA. 
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IV.  Simulation  results  on  the  Ptolemy  software. 

The  simulation  on  the  Ptolemy  software  allows  to  validate  the  complete  functional 
operation  of  the  architecture.  The  choice  for  the  intermediate  frequency  value  was  done  taking 
into  account  a  previous  study  on  analogue  radio-frequency  stages  [2]:  this  frequency  was  set  to 
20  MHz.  In  order  to  simplify  the  functional  block  implementation,  16  sample  values  per  period 
were  used  to  describe  the  digital  signals.  Thus,  to  perform  spectral  shifting  in  each  transmission 
band,  three  clock  rate  must  be  used.  The  first  one  equates  320  MHz  and  is  used  to  generate  the 
sine  and  the  cosine  waves  in  the  modulator  and  the  demodulator,  which  both  work  at  the 
20  MHz  intermediate  frequency.  The  two  other  can  vary  from  480  MHz  to  1.072  GHz,  and  are 
used  to  generate  the  sine  carrier  needed  to  perform  frequency  transposition  (30-67  MHz).  The 
figure  2  shows  signals  obtained  in  a  noise  free  transmission.  Simulations  using  additional  white 
Gaussian  noise  prove  that  the  digital  radio-frequency  stage  performs  a  higher  transmission 
quality  than  usual  analogue  equivalents  [2].  On  the  digital  design,  the  first  transmission  errors 
appears  with  a  signal  to  noise  ratio  close  to  12.5  dB,  while  an  efficient  analogue  design 
requires  a  signal  to  noise  ratio  of  17  dB.  The  figure  3  shows  the  bit  error  rate  according  to  the 
signal  to  noise  ratio  variations  the  10  MHz  and  the  87  MHz  channels. 


♦  BERfor  10  MHz 
■  BERfor  87  MHz 


Fig.  3:  Bit  Error  Rate  as  a  function  of  Signal  to  Noise  Ratio. 


V.  Conclusion. 

The  simulation  of  a  digital  radio-frequency  stage  dedicated  to  a  DA  VIC  compliant 
modem  has  been  done.  This  study,  focusing  on  the  1.544  Mbits/s  transmission  rate  on  DQPSK 
modulation,  points  out  the  main  problems  to  design  such  an  architecture.  Our  approach 
showed,  using  functional  simulation  on  the  Ptolemy  software,  that  our  architecture  performs  a 
better  quality  of  transmission  than  an  analogue  radio-frequency  stage  does.  The  high  clock  rate 
required  are  obtained  using  pipeline  techniques.  According  the  state  of  the  art,  the 
implementation  of  the  design  cannot  be  done  using  only  CMOS  technology.  A  faster 
technology  (such  as  AsGa,  ECL,  or  BiCMOS)  is  needed  to  perform  the  high  clock  rates 
required.  In  the  close  future,  an  implementation  of  the  complete  system  should  be  done  to 
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validate  the  architectural  design. 
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Abstract 

For  next  generation  services,  telecommunication  systems  require  maximum  possible 
bandwidth  efficiency  with  guaranteed  QoS  at  optimum  transmission  bit  rate.  A  system  can 
gain  these  three  characters  only  when  the  queuing  traffic  and  delay  set  to  the  minimum.  For 
this  purpose,  a  novel  ATM  system  over  PON  access  network  using  Fuzzy  Logic  sets  is 
designed  and  simulated.  In  this  system,  to  avoid  over  saturated  queuing  problem  due  to  cell 
multicasting,  the  Lock  and  Key  multicasting-multiplexing  approach  is  modelled.  This  leads 
to  the  minimum  queuing  delay  and  increases  the  number  of  cells  transmitted  to  end- 
subscribers  to  be  2.75  times  higher  than  normal  system.  Furthermore,  an  implementation  of  a 
dynamic  time  division  multiplexing  adapter  over  upstream  transmission  links  that  distributes 
the  available  bit  rate  over  active  subscribers  is  achieved.  Over  a  system  with  12.63%  ABR, 
this  adapter  improves  the  number  of  cells  transmitted  during  the  upstream  transmission  by 
14.5%  with  an  average  delay  reduction  up  to  40%. 

1,  Introduction: 

According  to  future  telecommunication  demands  and  services,  ATM  considered  being  the 
ground  on  which  the  future  telecommunication  systems  built  on.  From  this  point  of  view, 
ATM  systems  should  be  upgraded  to  meet  the  requirement  of  both  near  and  long  term  future 
telecommunication  services.  This  upgrade  should  provide  maximum  possible  bandwidth 
efficiency  with  guaranteed  Quality  of  Service  (QoS)  at  optimum  transmission  bit  rate.  These 
three  characters  can  be  achieved  only  when  the  queuing  traffic  and  delay  set  to  the  minimum 
[1-4]. 

In  this  paper  we  present  an  advanced  ATM  switch  with  multicasting-multiplexing 
function  that  avoid  loosing  bandwidth  by  minimising  the  use  of  bandwidth  via  repeated 
common  signals  and  supporting  the  ABR  services  [1-8].  This  will  lead  to  a  reduction  in  cell 
queuing  traffic  that  guaranteed  better  QoS.  It  also  affects  the  transmission  bit  rate  and  enables 
the  increment  to  be  proportional  directly  with  the  amount  of  decrement  in  the  cell  queuing 
traffic.  Therefore,  the  overall  network  capacity  will  be  improved  [3-7]. 

Considering  ATM  ability  to  provide  central  office  broadband  switching  capability  and 
deliver  multiple  services  over  cell  based  transport  technology,  it  can  be  used  to  transport  the 
information  stream  from  the  source  up  to  the  access  node  over  a  backbone  network.  This 
network  connects  several  access  networks  to  the  source  server  or  the  core  which  means  more 
services  can  be  provided  as  the  network  capacity  improves  by  sharing  the  common  capacity 
and  reducing  the  number  of  repeated  signals  to  a  single  copy  specially  in  point-to-multipoints 
(P-M)  services.  The  transport  network  can  be  spread  over  several  locations,  areas,  cities  or 
districts  and  connect  multi-access  networks.  Therefore,  it  is  known  as  the  regional  transport 
network  [3-5]  and  [8-12]. 

To  complete  end-to-end  transportation  links,  signals  transport  from  each  access  node 
to  end-subscribers  through  access  networks.  The  access  network  that  connects  the  access 
node  or  the  central  office,  where  ATM  switch  are  placed,  to  end-subscribers  multicast  and 
multiplex  the  ATM  cells  of  the  services  signals  to  their  specific  end-subscribers  according  to 
cells  header  information.  For  this  purpose  and  to  take  full  advantage  of  ATM’s  capability, 
several  access  networks  with  different  service  coverage  limitation  over  different  topologies 
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are  possible.  Taking  into  account  future  telecommunication  demands  and  services  die  most 
suitable  access  networks  are  active  double  or  single  star  and  ATM-PON  networks  [8-9]. 

In  active  double  or  single  star  access  network  the  central  office  connected  to  each 
subscriber  through  individual  fibre  link.  Therefore,  it  provides  each  user  with  155  Mb/s  bi¬ 
directional  transmission  links  which  means  it  can  cover  all  the  telecommunication  services 
described  previously.  However,  it  is  the  most  costly  access  network  and  its  bandwidth 
efficiency  during  the  upstream  transmission  is  low  as  the  available  bit  rate  for  each  subscriber 
during  the  upstream  transmission  is  not  being  used  most  of  the  time  [8] . 

The  other  suitable  access  network,  which  is  recommended  for  next  generation  is 
ATM-PON  access  network.  It  has  the  advantage  of  low  cost  and  high  bandwidth  efficiency 
comparing  to  active  double  or  single  star  network.  Although,  this  access  network  provides 
each  user  with  155  Mb/s  downstream  transmission  links,  the  upstream  transmission  links  is 
limited  to  155/n  Mb/s  where  n  is  the  number  of  the  access  network  subscribers  [8]. 

2.  Fuzzy  Logic  Sets:  ,  .  f  . 

As  an  outcome  of  merging  the  techniques  of  traditional  rule-based  expert  systems,  fuzzy  set 
theory  and  control  theory,  fuzzy  control  departs  significantly  from  traditional  control  theory 
which  is  essentially  based  on  mathematical  models  of  the  controlled  process.  Instead  of 
deriving  a  controller  via  modelling  the  controlled  process  quantitatively  and  mathematically, 
the  fuzzy  control  methodology  tries  to  establish  the  controller  directly  from  domain  experts 
or  operators  who  are  controlling  the  process  manually  and  successfully.  Cleaily,  this  is  a 
typical  characteristic  of  an  expert  system  where  primary  attention  is  paid  to  the  human  s 
behaviour  and  experience,  rather  than  to  the  process  being  controlled  [13-14], 

Normally,  applying  fuzzy  control  over  numerical  environments  employs  two 
procedures:  fuzzification  and  defuzzification.  Fuzzification  procedure  converts  numerical 
values  (x)  into  fuzzy  values  (X)  and  the  inferred  values  (Y)  are  converted  into  ensp  values  (y) 
which  are  compatible  with  the  numerical  environment  form  via  defuzzification  procedure 

[13-14],  . 

In  our  case,  three  numerical  inputs  that  are  routing  information  categories  elements 
for  received  cells  (xb  x2,  x3)  are  entered  the  fuzzification  procedure.  The  results  will  be  as 

Xi-Cell’s  signal  information  and  details,  X2-Signal’s  end-subscribers  number  and  the 
required  number  of  copies  from  each  cell,  X3-Signal’s  end-subscribers  addresses. 

According  to  fuzzy  values  X2  and  X3,  the  fuzzy  sets  within  the  inference  engine 
controls  the  cell’s  multicasting-multiplexing  rout.  For  this  aim  the  fuzzy  sets  aie  in  the 
following  form: 

//  X2  ="  Situation  and  X3  ="  Situation  Then  R  ="  Actiori'  (1) 

The  action  will  depend  on  the  multicasting-multiplexing  approach.  Finally,  the 
inference  engine’s  outputs  which  are  the  action’s  results  defuzzicated  into  numerical 
environment. 

3.  Lock  and  Kev  Multicasting-Multiplexing  Approach: 

This  approach  is  designed  to  avoid  over  saturated  queuing  problem  due  to  cell  multicasting 
within  the  access  node.  In  this  approach,  specific  decoder  called  the  lock  identifies  each 
subscriber.  Each  cell,  according  to  the  signal  that  it  belongs  to  and  its  multicasting¬ 
multiplexing  information,  addresses  with  specific  code  called  the  key.  The  key  depends  on 
the  signal’s  end-subscribers  in  other  word  each  code  is  a  master  key  that  opens  all  the  signa  s 

end- subscribers  locks.  . 

In  our  simulation,  we  have  designed  digital  binary  locks  and  keys  with  n  digits  where 
n  is  the  total  number  of  the  access  network  subscribers.  Each  digit  represents  a  specific 
subscriber.  Each  subscriber’s  lock  has  been  identified  by  setting  the  digit  that  represents  it  to 
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one  and  the  rest  of  the  digits  are  blanked.  While  in  each  key,  the  digits  which  represents  cell  s 
end-subscribers  are  setting  to  ones  and  the  rest  to  zeroes. 

However,  fuzzy  logic  sets  and  their  actions  are  used  in  designing  and  modelling  the 
required  key  for  each  cell.  The  multicasting-multiplexing  routes  consist  of  a  key  maker  and 
addresser.  The  key  maker  according  to  the  fuzzy  sets  action  and  fuzzy  values  X2  and  X3 
designs  and  models  the  required  key  and  passes  it  to  the  addresser  where  it  will  be  placed  on 
the  cell.  Through  the  defuzzification  procedure  the  key  converts  to  the  numerical 
corresponding  and  passes  to  all  the  subscribers  via  the  transmitter.  The  end-subscribers 
whose  lock  will  be  opened  with  the  key,  as  a  function  of  an  AND  get,  can  get  access  to  the 
cell  otherwise  the  access  will  be  prohibited. 

Furthermore,  to  insure  the  network  security  and  its  privately  fuzzy  sets  are  arranged  to 
control  the  filled  digits  in  each  subscriber’s  lock.  The  owner  of  any  lock  that  has  more  than 
one  filled  digit  will  be  shutoff  from  the  network. 

4.  Dynamic  Time  Division  Multiplexing  Adapter: 

In  addition  to  the  lock  and  key  approach  achievements  over  ATM-PON  access  network,  the 
ATM-PON  limitation  during  upstream  transmission  should  be  improved  to  cover  almost  all 
the  telecommunication  services.  This  is  applicable  by  implementing  a  dynamic  time  division 
multiplexing  adapter  over  the  upstream  transmission  links.  This  adapter  will  provide  each 
active  subscriber  with  155  Mb/s  transmission  links  at  specific  time  slots. 

The  dynamic  adaptation  that  is  employed  in  this  adapter,  controls  the  time  slots  length 
and  distributes  the  available  bit  rate  over  active  subscribers.  The  fuzzy  sets  involved  in  this 
adapter  are  in  the  following  form: 

If  Sm  ="  Situatiorl1  Then  Ta  Actiorl'  (2) 

The  situation  of  Sm  represents  the  subscriber  “m”  request  to  be  active  or  not  and  then 
the  action  is  to  assign  a  specific  time  slot  to  the  subscriber  m  or  not.  The  assigned  time  slots 
to  any  active  subscriber  will  be  cancelled  and  assigned  to  other  active  subscribers  as  the 
subscriber’s  data  transmission  is  finished. 

5.  Simulation  Results: 

Figure  1  shows  the  queuing  delay  at  the  core’s  transmitter  buffer  and  an  access  network 
multiplexer’s  transmitter  buffer  for  two  simulated  systems:  a  normal  system  and  a  system 
with  Lock  and  Key  approach.  From  figure  1  it  is  clear  that  by  implementing  the  Lock  and 
Key  multicasting-multiplexing  approach  the  queuing  delay  reduces  to  the  minimum  and  the 
number  of  cells  passed  the  multiplexer’s  transmitter  is  2.75  times  higher  than  the  normal 
system. 


Queuing  Delay 


Figure  1:  Queuing  Delay  against  Cells  Passed  the  Transmitters 


266 


The  result  of  implementing  the  dynamic  time  division  multiplexing  adapter  during  the 
upstream  transmission  is  shown  in  figure  2.  For  a  system  with  12.63%  ABR,  this  adapter 
improves  the  number  of  cells  transmitted  during  the  upstream  transmission  by  14.5%  with  an 
average  delay  reduction  up  to  40%. 
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Figure  2:  Dynamic  Time  Division  Multiplexing  Adapter  Result 


9.  Conclusions:  , 

From  this  paper  it  can  be  concluded  that  by  implementing  the  lock  and  key  multicasting¬ 
multiplexing  approach,  the  over  saturated  queuing  problem  due  to  cell  multicasting  is 
avoided.  Furthermore,  it  reduces  the  queuing  delay  to  the  minimum  and  improves  the  number 
of  cells  passed  the  multiplexer’s  transmitter  by  factor  of  2.75.  Finally,  the  dynamic  time 
division  multiplexing  adapter  over  a  system  with  12.63%  ABR,  improves  the  number  of  cells 
transmitted  during  the  upstream  transmission  by  14.5%  with  an  average  delay  reduction  up  to 
40%. 
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Abstract.  This  paper  proposes  an  empirical  formula  based  on  uniform 
geometrical  theory  of  diffraction  (UTD)  model  to  predict  the  diffracted  fields  from 
comers  of  buildings.  An  empirical  formula  is  derived  from  the  statistical  results  for 
the  characteristic  parameters  such  as  frequency,  distance  parameter,  incident  and 
observation  angles,  etc.  The  introduced  formula  is  possible  to  calculate  the 
diffracted  fields  effectively  with  a  small  difference  compared  to  the  fields  by  using 
UTD  method. 


I.  Introduction 

In  system  planning  for  land  mobile  radio  service  or  service  quality  evaluation,  it  is 
important  to  predict  the  propagation  characteristics  such  as  direct,  reflected,  and  diffracted 
fields  around  comers.  Several  prediction  models  to  predict  the  propagation  characteristics  have 
recently  been  reported  for  micro-cellular  communications  [1,2].  Most  empirical  prediction 
models  developed  previously  have  been  used  to  obtain  the  signal  strength  including  diffracted 
fields  over  rooftops  of  surrounding  buildings.  However  since  base  station  for  micro-cellular 
systems  is  located  below  the  rooftops  of  buildings,  the  diffracted  fields  from  comers  of 
buildings  are  greater  than  the  field  over  rooftops. 

In  this  study,  we  propose  an  empirical  formula  to  predict  the  diffracted  fields  from 
comers  by  using  the  statistical  results  obtained  by  UTD  formulation.  From  many  literatures, 
the  UTD  model  is  seen  to  give  good  general  agreement  with  the  available  measurements. 

Diffracted  field  is  mainly  affected  by  the  incidence  and  observation  angles  on  scattering 
body,  frequency,  distance  parameters  such  as  the  distance  from  diffraction  point  to  source  and 
distance  from  diffraction  point  and  observation  point.  Therefore  the  proposed  formula  is 
constituted  with  a  function  of  above  parameters.  We  show  that  the  theoretical  and  empirical 
results  were  in  good  agreement. 


II.  Empirical  Modelling  of  Diffracted  Fields 

The  magnitude  of  diffraction  coefficient  is  affected  by  the  frequency,  distance 
parameters,  and  incidence  and  observation  angles,  etc. 
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A.  Analysis  of  Diffraction  Coefficient 

Diffracted  coefficients  are  constituted  with  the  sum  of  incident  diffraction  coefficients, 
reflection  diffraction  coefficients,  and  reflection  coefficients  on  scattering  body.  The  incident 
diffraction  coefficient  possess  singularities  at  incident  shadow  boundary(ISB)  which  occur 
when  0  =  n  +  <f>'  and  <j>  -  <j>’  =  -n  and  the  reflection  diffraction  coefficient  possess  singularities 
at  reflection  shadow  boundary(RSB)  which  occur  ^  =  n  +  </>’  and  <f>  +  </>’  =  (2n  -  \)x  in  figure  1 
and  2.  Figure  1  and  2  show  the  incident  and  reflection  diffraction  coefficients  according  to 
observation  angles  the  when  incidence  angles  are  10°  and  110°,  respectively. 


0  50  100  150  200  250  300 


Observation  angle  $  degrees 

Fig.  1 .  Field  distribution  of  various  components 
of  diffraction  coefficients.  (Incidence  angle^') 
is  10  degree) 


Distance  parameter  log(L) 


Fig. 3.  Diffracted  field  distribution  versus  the 
distance  parameters 


Observation  angle  0  degrees 


Fig.  2.  Field  distribution  of  various  components 
of  diffraction  coefficients.  (Incidence  angle  ((j)') 
is  1 10  degree) 


Frequency  [MHz] 

Fig.4.  Diffracted  field  distribution  versus  the 
various  frequencies 


Figure  3  shows  the  magnitudes  of  positive  maximum  diffraction  coefficients  according 
to  the  observation  distances  at  1800MHz.  In  figure  3,  incidence  distance  is  set  to  be  400m, 
and  incidence  angle  is  1.25°.  Figure  4  represents  the  magnitude  of  positive  maximum 
diffraction  coefficients  for  the  various  frequencies  at  distance  parameter  L  =  -1. 


The  solid  curves  are  represented  the  theoretical  results,  and  dotted  curves  are 
represented  the  empirical  results.  It  can  be  seen  that  the  magnitudes  of  incident  and  reflection 
diffraction  coefficients  vary  with  the  frequencies  and  distance  parameters  as  well  as  the 
incidence  and  observation  angles  in  figure  1  to  4. 


B.  Empirical  Formula  of  Diffraction  Coefficient 


In  previous  session,  we  investigated  the  various  parameters  that  affect  to  the  diffraction 
coefficient.  By  using  fig.  3  and  4,  we  derived  the  empirical  formula  for  distance  parameters 
and  frequencies  as  follows. 


=  0.0 1 83(log/)2  - 0.224 1  log/  + 1 .3770 
=-0.0174(logl)3  +0.8070(logl)2  +  0.6 1 50  log  L- 0.0 128 

=  0.015  l(log/)2  -  0. 1 7241og/  +  0.4592 

Fneg  = -0.098  l(log/)2  +  0.39901og/+ 0.7352 
Lneg  =  0.4929(log  Z,)3 -3.145  l(log  L )2  +  3.56 1 6  log  L  -  2.4067 

Bncg  = -0.030  l(log/)2  +0.22 15  log/ -0.4269 


(1) 

(2) 

(3) 

(4) 

(5) 

(6) 


where /is  the  frequency  (in  MHz)  from  500MHz  to  3000MHz,  L  is  the  distance  parameter[3]. 


By  using  equation  (1)  to  (6),  the  empirical  formula  for  the  diffraction  coefficient  can  be 
written  as  equation  (7)  to  (13).  The  empirical  formula  can  be  classified  by  two  part  such  as 
incident  coefficient((7)  to  (9))  and  reflection  coefficient^  10)  to  (13)).  Each  coefficient  is  also 
divided  according  to  the  various  shadow  boundaries. 


Brtf^Pcoe/-ex  p 
A,/  =  AUr-exp 
A,/  =  ^-exFj 


-u 


1- 


<f>-CT 


-2d 


•2d 


fi-M-Yll 


2k- 


+  B^  for  RSB1<<21<RSB2 
B„tg  for<0  <RSB1 
+  Bneg  for<£>RSB2 


(7) 

(8) 

(9) 


Where  Fcoc^  Bpox  •  Fpox  B  ^  and  N Coe f  ^ mg  ’  Fneg  Fneg  .  RSB 1  and  R.SB2  are  incidence 
shadow  boundaries,  RSB  1  =  n  -  <j>'  for  <  90 ,  RSB2  =  2 /r  -  for  </>'  >  90  .  In  addition,  C AW  = 


(RSB2  -  RSBl)/2  and  CT  -  RSB1  +  CRSB  ,  ERSD  -  2/r  -  RSB2  xhe  incident  coefficients  are  affected 
by  the  ISB(;r  +  <f>') .  In  case  of  ISB  <  ( 2k  -  wedge  angle) , 


Dinc  =  Ncoef  -exp 

Dinc  =  Foe/  '  eXP| 


-2d  1 


2k -ISB 
2k -6 


-2d 


1- 


ISB 


/  J 


+  Bneg  for  <j>  <  ISB 
+  Bpoj  for  <f>>  ISB 


(10) 

(ii) 


In  case  of  ISB  >  (2x  -  wedge  angle) , 
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Dinc  =  Ncoef  '  eXP| 


-  2d 


+  £„eg 


(13) 


III.  The  Empirical  Results  of  Diffracted  Fields 

By  using  the  empirical  results  derived  by  the  proposed  formula,  we  compared  the 
empirical  results  with  the  theoretical  results  for  the  various  frequencies,  and  number  of 
diffraction  points.  Figure  5  shows  the  diffracted  fields  according  to  the  distances  at  900MHz 
and  1900MHz.  Figure  6  shows  the  diffracted  fields  for  the  different  number  (D=l  and3)  of 
diffraction  points  located  at  160m  from  transmitter.  From  figure  5  and  6,  it  can  be  seen  that 
the  empirical  and  theoretical  results  are  in  good  agreement  within  5  dB  difference. 


Fig. 5.  Diffracted  field  distribution  according  to  Fig-6.  Diffracted  field  distribution  according  to 

the  various  frequencies  the  different  number  of  diffraction  points 

IV.  Conclusion 

In  this  paper,  we  have  developed  the  empirical  formula  available  for  calculating  the 
diffracted  fields  on  the  basis  of  the  UTD  model  described  by  [3],  In  order  to  derive  the 
introduced  formula,  we  analyzed  the  statistical  properties  of  the  incidence  and  observation 
angles,  frequency,  distance  parameters. 

The  introduced  formula  is  applicable  to  system  design  effectively,  because  it  improves 
the  computational  efficiency,  the  results  are  in  good  agreement  with  the  theoretical  results. 
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Abstract: 

The  paper  presents  design  and  analysis  of  exploitation  of  algorithm  Simulated  Annealing  to  look  for 
optimal  multicast  routing  in  broadband  telecommunication  networks.  Routing  in  broadband 
telecommunication  networks  is  procedure  for  serve  of  demand  for  creating  of  connection,  which  arrived  to 
the  network.  For  to  create  of  multipoint  connection  in  broadband  telecommunication  network,  is  needed  to 
connect  the  nodes,  which  demand  connection,  with  a  minimal  cost  of  connection  From  results  result,  that 
computing  resources,  computing  time  and  cost  of  design  connection  in  broadband  telecommunication 
networks  depend  on  the  parameters  of  algorithm.  For  good  result  of  design  of  the  network  is  needed  a 
compromise  between  cost  of  connection  and  computing  time  of  designed  connection 

1.  Introduction 

Evolution  optimisation  algorithms  [2,6, 7,8]  present  a  set  of  algorithms,  which  use  evolution  processes  for 
solving  problems,  search  and  optimisation  in  complicated  systems.  In  precise  mathematical  formulations  of 
some  solving  problems,  e.g.  searching  of  multicast  routing  scheme  in  broadband  telecommunication  network 
[9,11],  number  of  mathematical  operations  is  increased  with  second  power  of  number  of  nodes  in  the  network. 
During  scanning  of  multidimensional  space,  calculation  was  often  stopped  in  a  local  extreme  and  did  not  achieve 
required  global  extreme.  Search  of  solutions  of  these  problems  initiates  usage  of  evolution  optimisation 
algorithms,  which  look  for  optimal  solution  by  stochastic  scanning  of  solution  space. 

2.  Routing  in  Broadband  Telecommunication  Networks 

Routing  in  broadband  telecommunication  network  is  an  important  part  of  control  of  the  broadband 
telecommunication  network.  It  is  a  procedure  for  serve  of  demand  for  creating  of  connection,  which  arrived  to 
the  network.  The  aim  of  routing  in  the  broadband  telecommunication  network  [5,9,11,14,15]  is  to  find  optimal 
connection  between  two  or  more  nodes  in  the  network,  which  demand  connection.  This  connection  is  optimal, 
when  a  cost  of  this  connection  is  from  a  set  of  all  passible  connections  minimal,  or  when  a  distinction  between 
the  cost  of  this  connection  and  minimal  cost  can  be  omitted,  considering  size  and  state  of  the  network  and  time 
of  design  of  routing  with  appropriate  algorithm. 


Fig.l  Routing  of  connection  in  a  broadband  telecommunication  network 

The  cast  of  connection  is  a  value  incident  a  distance  between  nodes  of  connection  djj,  bandwidth  of 
connection  by  and  actual  load  in  individual  links  in  the  connection.  In  routing  in  telecommunication  networks  is 
tendency  to  route  load  through  links  with  minimal  load.  Accordingly,  when  in  some  link  load  is  increase,  cost  of 
this  link  is  increase  loo.  So,  on  the  cost  has  effect  total  length  of  connections  between  all  nodes  in  the 
connection,  number  of  the  nodes  in  the  connection,  actual  load  in  the  network  and  propagation  delay  in 
transmission  mediums.  Objective  of  routing  in  broadband  telecommunication  network  is  effective  use  of 
network  resources. 
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Fig.  1  described  routing  of  a  connection  in  the  broadband  telecommunication  network.  Demand  for  creating 
of  connection  arrived  to  the  broadband  telecommunication  network.  This  demand  contains  set  of  nodes,  which 
demand  connection  and  required  bandwidth  for  this  connection.  According  to  actual  state  of  network  are  created 
input  parameters  for  evolution  optimisation  algorithm  -  link  matrix  and  cost  matrix.  From  this  input  parameters 
evolution  optimisation  algorithm  computes  a  routing  matrix.  This  matrix  defines,  which  links  in  the  network  will 
be  used  for  the  connection.  According  to  this  matrix  will  be  reserved  network  resources  for  the  designed 
connection. 


3.  Evolution  optimisation  algorithms 

Evolution  optimisation  algorithms  (EOA)  are  stochastic  algorithms  during  whole  behaviour  of 
computation,  every  time  find  global  extreme,  but  in  independence  time.  During  computation  using  EOA  is 
important  to  set  conditions,  when  actual  result  from  algorithm  is  a  global  result  of  solve  problem.  Therefore  to 
each  partial  result  is  compute  fitness  of  this  result.  If  fitness  satisfy  defined  conditions,  the  algorithm  is  stopped 
and  result  from  the  algorithm  is  a  solution  of  defined  problem. 

Algorithm  Simulated  Annealing 

Algorithm  Simulated  Annealing  (SA)  [2,3]  is  deducted  from  an  idea,  that  searching  of  global  extreme  is 
analogous  to  annealing  of  a  solid  body.  Hence,  simulate  physical  process,  when  the  solid  body  is  heated  on  the 
some  temperature  and  after  this  is  cooled  slowly.  With  this  process  are  eliminated  some  defects  in  a  crystal  grid. 
In  this  case  is  solid  body  represents  by  genotype,  represents  by  array  x.  To  this  array  x  can  be  assign  some 
energy  of  solid  body  f(x).  In  case  of  annealing  of  solid  body  energy  of  solid  body  is  minimised,  in  case  of  SA 
function  value  f(x)  is  minimised.  An  array  x  is  changed  to  a  new  array  x\  which  replaced  array  x  with 


probability  set  by  the  Metropolis  relation  [2]: 

fl  ,ak/(x)>/(x) 

P(x-»x)  =  j  -(/(*)-/fr)) 

|e  T  ,ak/(x)</(x) 


(1) 


where  xroi„  <x<  is  an  analogy  of  temperature.  Temperature  x  is  limited  by  its  maximal  XIIWX  and  minimal  xmin 
value,  temperature  is  decreased  by  its  multiply  with  a  constant  y=0.  Variable  h  is  number  of  trials  for  given 
temperature  x  and  variable  k  is  number  of  successful  trials.  Maximal  value  kinax  can  be  select  from  several 
hundred  to  several  thousand,  for  its  don’t  exist  exact  rule,  lw  =  10  *  lw  Using  function  Rnd  are  generated 
numbers  from  interval  (0,1).  In  the  array  x  is  stored  the  best  result  within  the  run  of. 

Algorithm  SA  can  be  expressed  follow: 


1 .  Create  initial  array  x  randomly 


2.  X=Tmax>  X  —  X  ,  k  —  1 

3.  h  =  0;  k  =  0 


4.  h  =h  +  1 


5. 

6. 
7. 


X  =t‘X 

if  /(x)</(x),  then  P=l,  else  P=c  x 

if  Rnd  <P,  then  x  =  x  ;k=k+l 

if  /(x)</(x),  then  x  =  x 


8.  if  h  <  hmax  a  k  <  kmax  then  go  to  step  4 

9.  T  =  y-T 

10.  if  x  >  tmin  a  k  >  0  then  go  to  step  3 

11.  End 


4.  Routing  in  broadband  telecommunication  network 

For  to  create  multipoint  connection  in  broadband  telecommunication  network  [5,9,11,14,15]  is  needed  to 
connect  nodes  of  network,  which  demand  connection,  with  a  minimal  cost  of  connection.  Algorithm  is  applied  to 
the  network  on  Fig.2,  selected  randomly.  Table  1  describes  cost  matrix  of  this  network  according  to  actual  state 
of  the  network.  Marked  links  enabled  transmission  in  the  brier  direction  only.  Bandwidth  of  all  links  is  sufficient 
for  to  create  of  connection. 
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Fig.2  Broadband  telecommunication  network 
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In  the  next  part  is  described  an  example  of  routing  in  broadband  telecommunication  network  using 
algorithm  Simulated  Annealing.  Connection  is  designed  in  the  network  on  the  Fig.3  between  nodes  7,  10,  11. 
Costs  of  connection  using  individual  links  are  described  in  the  table  1. 

Dependency  of  cost  of  connection  on  the  parameters  of  algorithm  SA  Tmjn,  T^x  *  lw  is  presents  in  the 
Table  3.  Parameters  of  algorithm  must  be  set  thus,  that  an  optimal  or  a  near  to  optimal  solve  will  be  found  the 
most  quickly.  In  each  designed  connection  are  two  times  of  computing.  The  first  one  is  time,  when  optimal 
solution  was  found.  The  second  one  is  the  time,  when  algorithm  was  stopped.  The  aim  is  to  stop  algorithm,  when 
optimal  solution  is  find.  This  limit  satisfies  values,  marked  by  grey  colour.  From  Table  4  result,  when  limit 
values  of  parameters  of  algorithm  are  set,  algorithm  doesn’t  achieved  optimal  solution  always.  If  optimal 
solutions  arc  needed,  parameters  of  algorithm  must  be  to  move  to  the  optimal  area.  In  the  Fig.3  is  design  of 
optimal  connection  with  cost  =  340.  Optimal  matrix  of  designed  connection  for  multipouint  routing  between 
nodes  7,10,1 1  with  cost  of  connection  340  is  in  Tab.6. 
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Fig.3  Optimal  connection  -  cost  340 
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Tab.  6  Connection  matrix 


5.  Conclusion  ...  .  .  .  f  .  . 

From  results  result,  that  effect  of  individual  evolution  optimisation  algorithms  at  design  or  connection  m 
broadband  telecommunication  network  depends  on  the  size  and  structure  of  the  broadband  telecommunication 
network.  For  a  small  network  can  be  sufficiency  algorithm  Hill  Climbing,  for  a  large  network  Simulated 
annealing  or  Evolution  strategy,  it  depends  on  computation  capacity,  which  is  available.  Time  of  computing  of 
algorithms  cannot  be  compared  objectively,  because  it  depends  on  the  implementation  of  algorithm  using 
software  and  hardware.  Algorithm  Simulated  annealing  can  be  accelerated  using  more  quickly  hardware. 
Evolution  strategy  can  be  distributed  on  the  multiprocessor  system,  where  each  individual  in  the  population  is 
computed  by  own  processor.  Time  of  computing  can  be  reduced  too  with  a  combination  of  individual 
algorithms. 

For  good  result  of  design  of  the  network  is  needed  a  compromise  between  kind  of  algorithm  cost  of 
connection  and  computing  time  of  designed  connection.  This  is,  design  of  the  connection  will  be  approached  to 
the  optimal  connection  an  computing  time  will  be  guaranteed  desired  criteria  for  routing  in  broadband 
telecommunication  network. 
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Abstract.  This  paper  presents  an  assessment  of  the  performance  of  a  radio  over 
fibre  link  between  a  remote  antenna  unit  and  a  base  station.  A  novel  technique 
using  direct  sequence  spread  spectrum  is  proposed  to  minimise  the  IMD  by 
decreasing  the  signal  amplitude  prior  to  a  direct  modulation  of  a  laser  diode.  The 
results  show  that  the  optical  fibre  microcellular  system  is  outperforming  the  full 
wireless  system. 


1.  INTRODUCTION 

The  transmission  system  that  uses  both  radio  and  optical  fibre  elements,  so-called  radio 
over  fibre,  has  been  a  subject  of  many  investigations  [1-4].  In  mobile  communication 
system,  microcellular  system,  whose  cell  size  is  reduced  to  several  100  meters  [5-11],  has 
been  proposed  in  order  to  face  a  rapid  increase  in  the  number  of  subcarriers.  The 
requirement  for  both  the  huge  capacity  and  the  multimedia  applications  (broadband)  will 
lead  to  the  minimisation  of  the  cell  size.  The  reduction  in  cell  size  improves  the  frequency 
utilisation  efficiency,  reduces  the  power  consumption  of  radio  equipment  and  makes  portable 
sets  smaller.  But  this  implementation  needs  the  placement  of  many  radio  base  stations 
(RBS)  collecting  and  delivering  RF  signals  and  the  effective  connection  among  so  many 
RBSs.  To  solve  these  technical  problems,  it  is  proposed  that  microcells  in  wide  area  are 
connected  via  optical  fibres,  and  radio  signals  are  transmitted  over  optical  fibre  link  among 
RBSs  and  control  station  (CS).  By  employing  this  system,  analogue  optical  links  are  ideal 
for  this  application  because  of  fibre’s  exceptionally  low  loss,  high  bandwidth,  and  low 
distortion  in  the  1.3  and  1.55  pm  bands.  By  modulating  the  received  RF  signal  on  a 
lightwave,  high  fidelity  transport  of  these  signals  without  repeaters  or  optical  amplifiers  can 
be  achieved  over  many  tens  of  kilometres.  Furthermore,  RBS  is  equipped  only  with  an 
electric-to-optic  converter  (E/O)  and  an  optic-to-electric  converter  (O/E),  and  all  of  the 
complicated  function  such  as  RF  modulation  and  demodulation,  frequency  assignment, 
spectrum  delivery  switching  and  so  on,  are  performed  at  the  CS  as  shown  in  figure  1. 

Subcarrier  multiplexing  (SCM)  allows  the  radio  frequency  carriers  to  modulate  directly  a 
laser  diode  and  be  transported  over  the  optical  fibre  without  the  need  for  frequency 
conversion  and  multiplexing/demultiplexing  functions.  The  presence  of  non-linear  device, 
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such  as  a  laser  diode  (LD),  causes  mixing  of  the  frequency-division  multiplexed  (TDM) 
subcarriers  and  creates  new  frequencies  [10-18],  some  of  that  may  coincide  with  the 
subcarrier  of  interest.  These  frequencies  are  an  additional  source  noise,  commonly  referred 
to  as  the  intermodulation  distortion  (IMD)  noise.  Several  techniques  have  been  proposed  to 
reduce  the  impact  of  harmonic  and  intermodulation  in  modulated  semiconductor  LD  such  as 
pre-distortion  and  FM  modulation  techniques  [5,6], 

The  proposed  distributed  antenna  (DA)  system  is  also  compatible  with  existing  analogue 
and  digital  cellular  systems  and  upgradable  to  future  standards.  It  can  increase  the  user 
capacity  without  adding  expensive  base  stations.  A  low  cost,  simple,  and  compact  optical 
transceiver  with  a  simple  omnidirectional  antenna  can  be  located  anywhere  within  a  normal 
macrocell,  where  the  traffic  demand  is  high  or  the  proper  signal  reception  is  difficult  due  to 
shadowing.  Thus,  DA  systems  can  be  extended  to  cover  the  indoor  PCS  as  well. 

The  analogue  optical  fibre  link  benefits  as  a  subsystem  in  microcell  designs,  as  well  as  an 
antenna-remoting  applications  in  general,  are: 

•  High  bandwidth 

•  Low  loss 

•  Ease  of  installation  compared  with  copper  cables 

•  Insensitivity  to  EMI 

•  Simplicity  of  design 

•  Reliability 

In  considering  a  suitable  method  of  analysing  the  intermodulation  distortion  in  a  GSM 
radio-over-fibre  system,  a  number  of  factors  must  be  considered.  Firstly,  the  number  of 
subcarriers  in  the  system  is  important.  In  a  picocellular  system  there  may  be  only  a  single 
frequency,  in  which  case  there  may  be  no  IMD  to  consider  (neglecting  interference  from 
distant  base-stations).  However,  any  base  station  that  is  required  to  process  more  than  one 
frequency  may  or  may  not,  depending  on  the  frequency  allocation,  suffer  from  IMD  all  relate 
to  two  tone  analysis,  and  although  consider  the  effects  of  more  than  two  carriers,  the  work  is 
based  upon  the  assumption  that  triple  beat  distortion  will  be  6  dB  higher  than  two  tone.  The 
analysis  allowed  for  an  unlimited  number  of  subcarriers,  but  assumed  equal  amplitude. 
However,  the  relative  amplitude  of  the  subcarriers  will  have  a  bearing  on  the  intermodulation 
effects,  and  uplink  carrier  amplitudes  will  not  be  equal.  A  second  consideration  is  the 
bandwidth,  frequency  spacing  and  modulation  methods  of  the  carriers  which  are 
multiplexed.  The  GSMK  carrier  modulation  that  is  used  in  the  GSM  system  can  be 
considered  to  be  a  Gaussian-filtered  FM  signal.  Hence,  the  frequency  modulated, 
(electrically)  frequency-division  multiplexed  carrier  (FM/FDM),  could  be  applicable  as  a 
simple  approximation. 

This  paper  presents  a  GSM-specific  assessment  of  the  performance  of  a  radio  over  fibre 
link  between  a  remote  antenna  unit  and  GSM  base  station.  The  combination  of  GSM 
physical  layer  model  and  optical  environment  model  within  the  same  simulation  package 
allows  a  comprehensive  evaluation  of  performance  of  the  fibre-link  under  specific  signalling 
conditions.  A  novel  strategy  using  spread  spectrum  has  been  proposed  to  decrease  the  IMD 
parameters,  which  consequently  improves  the  carrier  to  noise  ratio  of  the  system. 
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II.  RADIO  OVER  FIBRE  PHYSICAL  LAYER  MODEL 

A.  GSM  Physical  Layer  Model 

A  comprehensive  model  of  the  GSM  physical  layer  is  simulated  from  the  Speech  and 
Channel  Coder,  through  interleaving,  modulation,  filtering  and  transmission  into  the  air.  A 
radio  environment  model  includes  multipath  fading  as  a  function  of  operating  frequency  and 
mobile  speed.  In  addition,  co-,  adjacent-,  and  alternate  channel  interference  and  thermal 
noise  are  simulated.  Reception  includes  a  Viterbi  equaliser  followed  by  a  channel  and  bit- 
accurate  speech  decoder.  Many  performance  measures  are  available  including  bit-error  rate 
(BER)  and  frame  erasure  rate  (FER).  The  layer  is  implemented  in  block  form  under  the 
discrete-time  simulation  software  Signal  Processing  WorkSystem®  (SPW®),  as  shown  in 
figure  2  [19]. 

B.  Optical  Fibre  Link  Model 

A  model  of  a  single  mode  Fabry-Perot  semiconductor  laser  that  predicts  laser 
performance  in  analog  transmission  has  been  developed  as  shown  in  figure  3.  The  non 
linearity  model  of  a  semiconductor  laser  diode  is  based  on  the  simulation  of  the  memoryless 
methode  using  Taylor’s  Series.  The  model  can  therefore  be  used  to  simulate  amplitude  and 
frequency  modulation  of  the  laser  to  allow  investigation  of  characteristics  such  as  light 
versus  current,  step  response,  harmonic  and  intermodulation  distortion.  The  use  of  discrete 
time  simulation  software,  in  this  case  Signal  Processing  WorkSystem®,  allows  easy 
simultaneous  solution  of  the  nonlinear  equations  in  a  block  oriented  manner  shows  the 
constituent  blocks  of  the  optical  environment.  The  laser  driver  block  in  figure  3  allows  the 
provision  of  a  signal  pre-processing  scheme  such  as  some  form  of  predistortion,  secondary 
modulation  or  signal  spreading  [5,6]. 

III.  CARRIER  -TO-NOISE  RATIO  (CNR) 

A.  Principles  of  Radio  over  Fibre 

The  goals  of  analogue  system  designers  are  large  bandwidth,  low  distortions,  large 
signal-to-  noise  ratio  (SNR),  and  large  spurious  free  dynamic  range  (SFDR).  The  SFDR  of 
an  analogue  optical  link  is  the  range  of  radio  frequency  (RF)  input  powers  for  which  a  two- 
tone  RF  input  signal  could  be  clearly  distinguished  from  noise  and  nonlinearities  at  the  link 
output. 

The  most  important  characteristics  of  the  analogue  optical  link  is  the  carrier-to-  noise  ratio 
(CNR),  defined  as  the  ratio  of  rms  carrier  power  to  rms  noise  power  at  the  output  of  the 
optical  receiver,  and  is  given  by 


CNR 


Carrier  power 

<  ^source  >  +  <  IL,  >  +  <  iLnal  >  +  <  ifmd  > 


(1) 


Where 

^sourc^  -  source  noise,  given  by 


<I2sourcc>  =  RIN(SRPr)2B 
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Pr  average  received  optical  power; 

photodiode  responsivity; 

B  noise  bandwidth  of  receiver; 

RIN  laser  diode  relative  intensity  noise  is  defined  by 

™  <(A?l)2> 


<(APl)2>  -  the  mean  square  intensity  fluctuation  of  the  laser  output; 
PL  average  laser  output  intensity. 


shot'5' 


photodiode  noise  known  as  shot  noise, 

►  _  the  preamplifier  noise  (thermal  noise). 


The  intermodulation  distortion  (IMD)  term,  <I2imd>  in  (1)  is  additional  source  of  noise 
that  arises  when  multiple  message  channels  operating  at  different  carrier  frequencies, 
particularly  in  FDMA  systems,  are  sent  simultaneously  over  the  same  fibre. 

B.  Intermodulation  Distortion 

The  IMD  occurs  mainly  in  laser  diode  modulation  process  as  a  result  of  its  nonlinear 
behaviour,  which  has  its  origin  from 

1 .  Injection  current  versus  output  light  (I-P)  curve  nonlinearity; 

2.  Dynamic  nonlinearity  due  to  intrinsic  photon-electron  nonlinear  interaction; 

3.  Nonsymmetric  threshold  clipping  (overmodulation  distortion). 

The  dynamic  nonlinearity  effect  is  frequency  dependent  and  increases  with  modulation 
frequency.  If  the  operating  frequency  band  of  the  channel  is  less  than  one  octave  all  t  e 
harmonic  distortions  and  even  order  IMD  products  will  fall  outside  the  passband.  Thus  third 
order  types  at  frequencies  f,  +  fj  -  fk  (triple-beat  IMD  products)  and  2ft  -  fj  (two-tone  third- 
order  IM  products)  are  the  most  dominant,  higher  order  products  tend  to  be  sigmfican  y 
smaller.  IMD  increases  with  increasing  modulation  index  and  signal  power  (i.e.  increasing 
number  of  users  in  CDMA  systems)  as  well  as  increasing  number  of  channels  in  FDMA 
systems.  Hence  IMD  places  restrictions  on  maximum  average  signal-to-noise  ratio  ot  a 
system. 

A  new  method  is  proposed  to  improve  the  CNR  and  increase  the  dynamic  range  of  the 
system  as  shown  in  figure  4.  The  direct  sequence  spread  spectrum  (DS-SS)  is  used  to  spread 
the  output  signals  from  the  combiner  which  will  be  used  as  a  modulating  input  signal  to  the 
laser  diode.  Therefore,  the  amplitude  of  the  input  signal  to  the  laser  diode  after  spreading  is 
lower  than  the  signal  before  spreading  as  shown  in  figure  5.  This  will  lead  to  a  reduction  m 
the  relative  intensity  noise,  shot  noise  and  intermodulation  noise  while  the  thermal  noise  will 
not  be  effected.  The  amount  of  the  reduction  in  the  noise  will  depend  on  the  processing  gain 
(G„)  of  the  spread  spectrum.  For  example  both  the  relative  intensity  noise  and 
intermodulation  noise  will  be  reduced  by  a  factor  of  GP2  while  the  shot  noise  will  be  reduced 
by  a  factor  of  Gp. 
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In  this  paper,  the  radio  channel  is  assumed  to  be  Rayleigh  fading  channel  corrupted  by 
additive  white  Gaussian  noise  (AWGN)  and  the  optical  fibre  behaves  as  a  non- 
dispersive  channel  for  the  distance  involved.  The  average  carrier  to  noise  ratio  for  the 
wireless  system  (in  the  absence  of  optical  fibre  link)  is 

2 

CNRwl  =  Kmw  y  (2) 

&awgn 

where  a2awgn  is  additive  white  Gaussian  noise  (AWGN),  ac2  is  the  carrier  signal  and 
Kmw  is  the  average  microwave  attenuation  factor  from  the  RBS  to  the  CS.  While  the 
total  noise  power  cr*  at  the  output  of  the  IF  filter  in  the  GMSK  receiver  is  the  sum  of 
the  optical  fibre  system  noise  and  the  AWGN  from  the  radio  channel  [3],  and  is 
expressed  as 

_ 2  _  -2  ,  -.2  (3) 

where  a20f  is  the  noise  power  for  the  optical  fibre  system.  The  average  carrier  to 
noise  ratio  for  the  optical  fibre  microceelular  system  is 

CNR0F  =  K0F?f  (4) 

of, 

where  Kof  is  the  total  optical  fibre  attenuation  factor. 

Figure  5  shows  the  performance  of  the  full  wireless  system  and  optical  fibre 
microcellular  system  with  and  without  CDMA.  The  actual  performance  of  the  non- 
optical  fibre  system  will  be  worst  than  that  depicted  in  figure  5.  Because,  for  the 
simulation,  the  link  in  the  non-optical  fibre  system  between  the  base  station  and  the 
central  station  is  assumed  to  be  lossless,  which  is  not  the  case  when  the  actual 
microwave/coaxial  cable  link  is  taken  into  account.  This  would  mean  that  with 
optimum  design  such  that  SNRof-  SNR,  the  optical  fibre  system  may  outperform  the 
non-optical  fibre  system  as  seen  in  figure  5,  where  the  exact  BER  of  the  former  is  less 
than  the  BER  of  the  latter.  This  allows  lower  launch  power  requirements,  making  the 
handheld  sets  more  compact  (or  conversely,  yields  longer  uninterrupted  service  time). 
The  lower  exact  BER  performance  of  the  optical  fibre  system  is  a  consequence  of  the 
randomness  in  the  noise  power  due  to  its  dependence  on  the  fluctuating  subcarrier 
envelopes. 


V.  CONCLUSION 

The  integration  of  an  optical  fibre  link  within  a  GSM  physical  layer  verification 
environment  has  allowed  evaluation  of  microcellular-based  radio  over  fibre.  The 
paper  shows  that  by  using  spread  spectrum  technique,  the  intermodulation  distortion 
is  decreased.  Therefore,  the  optical  fibre  microcellular  system  has  a  potential  to  be 
one  of  the  most  successful  candidates  for  the  future  third  generation  mobile 
communications  especially  with  the  advancement  of  the  software  radio  networks.  The 
improvement  of  the  CNR  in  this  technique  is  due  to  the  inherited  properties  of  the 
spread  spectrum,  where  the  power  of  the  signal,  just  after  spreading  at  the  transmitter, 
is  compressed  (reduced).  The  power  compression  has  a  positive  affect  on  the 
reduction  of  the  RIN,  phase  noise  and  IMD.  The  despreading  process,  at  the  receiver, 
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has  a  positive  affect  on  the  signal  by  increasing  its  level  by  a  value  of  Gp.  This  radio 
ove^  fibre  modelling  is  currently  support  the  900MHz  GSM  standard  and  further 
studies  are  in  progress  to  cover  the  1 800MHz  GSM. 
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Figure  1 .  The  optical  connection  between  a  radio  base  station  and  the  central  base 

station. 


Figure  2.  GSM  digital  speech  channel  simulation  block  diagram  including  the 
additional  optical  fibre  link  model. 


Optical  Environment 
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Figure  3.  Block  diagram  of  optical  environment. 


#1 


Figure  4.  Block  diagram  of  subcarrier  multiplexing  system  using  the  direct  sequence 

spread  spectrum  DSSS 
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Abstract:  This  paper  desribes  current  implementations  of  wireline  V-Series 
modems,  (V.34  and  V.90),  by  contrasting  them  with  fixed  DSP  and  desktop 
modem  implementations,  including  Host  Signal  Processing  (HSP)  variants. 

I.  Introduction 

The  rapidly  expanding  infrastructure  of  the  Internet  and  intranets  is  creating  rapid 
development  of  various  web  appliances  beyond  the  personal  computer  (PC.  These 
appliances  require  wireline  or  wireless  connectivity.  For  wireline  connectivity  there  are 
various  new  possibilities  through  Asynchronous  Digital  Subscriber  Lines  (ADSL),  cable 
modems,  or  company  local  area  networks  (LAN's).  However,  for  the  majority  of 
subscribers  in  the  next  5-10  years,  the  most  common  connection  will  be  through  V.34  or 
V.90  analog  modems.  With  increases  in  processing  power  available  from  both 
microprocessors  and  digital  signal  processors  (DSP's),  various  implementations  of  such 
modems  are  possible.  This  paper  describes  typical  modem  implementations,  and  then  the 
structure  and  benefits  used  in  porting  the  modem  to  a  specific  communications 
microprocessor  that  yielded  a  full-featured,  low  cost  port  for  embedded  systems. 


II.  Typical  Modems 

In  the  "early  days  of  electronics  history,"  7-8  years  ago,  when  the  V.32  (9600  bps) 
analog  modem  was  being  introduced,  typical  modem  architectures  consisted  of: 

•  A  DSP  and  memory, 

•  A  microprocessor  and  memory, 

•  A  coder  /  decoder  (CODEC)  converting  signals  between  digital  and  analog,  and 

•  A  lot  of  analog  circuitry  comprising  a  data  access  arrangement  (DAA),  which  handled 
the  conversions  in  voltage  between  the  telephone  line  and  the  CODEC,  as  well  as 
isolating  various  noise  events  that  could  happen  on  the  telephone  line. 

A  block  diagram  of  such  a  modem  is  shown  in  the  following  figure: 
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Figure  1:  Typical  Modem  Architecture,  mid-1990's 

The  connection  from  the  modem  to  the  device  using  the  modem  was  typically  through 
a  universal  asynchronous  receiver  transmitter  (UART),  such  as  the  UART  port  on  PC  s. 
The  PC  did,  and  still  does,  make  requests  for  services  from  the  modem  through  a 
command  protocol  referred  to  as  the  "AT  Command  Set,"  developed  by  Dennis  Hayes  of 
Hayes  modem  fame. 

The  progression  of  events  as  modem  speeds  increased  over  the  years  led  to  a  few 
fundamental  changes  in  modem  implementation: 

1 .  CODEC's  were  improved  with  better  linearity  and  clocking  mechanisms. 

2.  DAA's  were  integrated  and  made  programmable  for  worldwide  capability.  In  some 
cases,  the  DAA  has  been  integrated  in  new  structures  with  the  CODEC  function. 
This  allowed  miniaturization  since  the  circuit  board  area  required  for  the  analog 
CODEC  /  DAA  functionality  has  shrunk  from  several  10’s  of  square  centimeters  to 
approximately  10  square  centimeters.  Performance  of  such  integrated  components 
has  also  improved,  and  cost  has  decreased  from  10's  of  U.S.  dollars  to  less  than  5  U.S. 
dollars. 

3.  The  algorithms  used  for  the  modem  function  itself  became  much  more  intensive  in 
the  requirements  for  digital  signal  processing.  A  good  description  of  the  types  of 
algorithms  used  along  with  their  effects  and  general  processing  requirements,  is 
available  on  the  worldwide  web  (WWW)  from  Texas  Instruments  (www.ti.com). 
Although  this  application  note  is  too  old  to  be  of  much  use  in  current  applications,  it 
does  clearly  describe  many  issues  in  wireline  modem  design. 

4.  More  and  more  implementations  of  modems  were  for  PC's  or  the  corresponding 
infrastructure.  As  PC  processors  became  more  and  more  powerful,  it  became 
possible  to  provide  the  signal  processing  on  the  PC  host  processor  -  usually  an  Intel 
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x8 6  or  IBM  /  Motorola  PowerPC™  microprocessor.  On  the  other  end  of  the  wire,  at 
the  network  interface,  more  powerful  DSP  processors  have  lowered  the  cost  of 
infrastructure  modems  by  providing  several  modem  channels  on  a  single  DSP  or 
multiple  DSP’s  on  a  monolithic  die. 

5.  V.34  (28.8  -  33.6  Kbps)  /  V.90  (56  Kbps)  DSP  implementations  of  modems  require 
approximately  40  MHz  processing  speed  for  a  single  clock  multiply-accumulate 
(MAC)  DSP  chip,  while  and  advanced  Pentium  or  PowerPC  will  use  approximately 
60  MHz  of  its  available  bandwidth  to  run  the  modem  function.  In  order  to  be  able  to 
run  browsers  or  other  applications  programs,  it  is  usually  not  advisable  to  run  the 
modem  in  a  host  signal  processing  (HSP)  implementation  unless  the  host  processor  is 
running  at  200  MHz  or  faster.  Many  customers  still  feel  comfortable  with  a  hybrid 
approach  where  the  high  MIPs  DSP  functions  are  still  done  on  a  DSP  /  CODEC  / 
DAA  board  attached  to  the  PC  through  an  ISA  or  PCI  interface.  In  this  environment, 
the  bulk  of  the  modem  code  exists  in  the  PC  memory  space  and  is  run  on  the  host 
computer  versus  a  dedicated  modem  controller. 

6.  The  memory  requirements  for  modem  implementations  increased,  both  for  the 
baseline  modem  function  and  for  typical  features  like  speakerphone,  voicemail,  caller 
identification,  ...  Memory  requirements  for  a  V.34  modem  were  on  the  order  of  500 
KB  while  a  V.90  modem  will  typically  use  close  to  1MB.  It  is  worthwhile  to  note 
that  the  signal  processing  portion  of  the  code,  the  data  pump,  only  requires  a  few  10’s 
of  KB  of  scratchpad  and  code  space. 

7.  Many  modems,  including  of  course  HSP  modems,  have  become  programmable. 
Much  of  this  is  for  features,  like  answering  machine  functionality,  but  in  the  transition 
from  V.34  to  V.90  it  also  became  a  big  selling  point.  The  main  reason  that  this 
happened  was  because  there  were  two  competing  proposals  for  the  V.90  standard,  and 
suppliers  wanted  to  deliver  the  modems  but  had  to  promise  upgrades  to  conform  to 
the  standard.  This  means  that  the  memory  used  for  the  DSP's  is  now  usually  RAM  or 
non-volatile  memory  versus  the  typically  ROM-based  modems  of  the  early  1990's. 

III.  Implementing  Infrastructure  DSP  Modems 

Infrastructure  modems  try  to  put  as  many  modems  into  as  small  a  space  as  possible. 
Usually  the  various  phone  lines  are  switched  to  the  DSP  "channel  bank"  through  a  serial 
structure  like  a  PCM-30  interface.  (PCM  stands  for  pulse-code  modulation,  and  here  the 
channels  are  usually  64  Kbps  channels  formed  by  sampling  the  phone  line  at  the  switch  at 
8K  samples  per  second  with  8-bit  resolution.  This  is  the  standard  method  for  converting 
an  analog  voice  line  into  a  digital  signal  for  call  routing  through  the  switching  network.) 

The  PCM  interface  is  interconnected  to  multiple  DSPs  and  control  software  tells 
each  DSP  what  time  slot(s)  it  is  supposed  to  function  as  a  modem  for. 

The  major  difference  between  infrastructure  modem  requirements  relates  to  the 
implementation  of  the  modem  control  code  and  making  the  modem  code  efficient. 
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As  previously  mentioned,  modem  control  code  is  quite  a  large  body  of  code  and,  for  a 
normal  microprocessor  doing  V.90,  for  example,  can  typically  be  done  in  a  10  MIPS 
machine.  If  the  control  software  is  done  on  a  DSP,  the  architecture  is  usually  not  well 
suited  to  dealing  with  lots  of  off-chip  code.  This  typically  leads  to  one  of  several 
implementations: 

1 .  An  implementation  where  the  DSP  does  the  data  pump  and  control  code  with  lots  of 

on-chip  memory  increasing  the  cost  of  the  DSP  significantly, 

2.  An  implementation  where  there  is  a  control  processor  handling  modem  control  code 

for  a  large  number  of  DSP's,  each  one  running  one  or  more  data  pumps. 

The  second  implementation  is  the  most  frequently  applied.  New  infrastructure 
channel  banks  might  have  32  -  64  channels  handled  by  8  -  16  DSP’s  and  a  single  control 
processor. 

IV.  Implementing  HSP  Modems 

HSP  modems  are  typically  for  one  of  two  markets:  applications  for  embedded  or 
desktop. 

In  the  desktop  market,  the  biggest  issue  with  an  HSP  modem  is  to  get  it  to  perform 
correctly  with  the  desktop  operating  systems,  such  as  Windows  or  MacOS.  These 
operating  systems  are  not  real-time  operating  systems  with  guaranteed  performance  at  the 
sample  rate  of  modem  CODECs  in  the  thousands  of  interrupts  per  second.  There  are  also 
a  wide  range  of  software  applications  which  could  be  operating  in  the  desktop 
environment  and  which  could  interfere  with  the  required  performance. 

Nevertheless,  several  manufacturers,  including  Motorola,  have  successfully  released 
HSP  modems  for  the  PC.  The  only  concession  that  is  required  is  a  small  integrated 
circuit  that  maps  the  CODEC  /  DAA  to  the  PC  input  bus,  typically  ISA  or  PCI,  and  which 
provides  a  hardware  first-in  first-out  (FIFO)  buffer  to  lower  the  required  interrupt  rate  for 
HSP  processing  into  an  acceptable  range.  These  implementations  also  rely  on  the  fact 
that  the  newer,  faster  desktop  processors  also  provide  enhanced  mathematics  processing 
capability  to  perform  the  DSP  functions. 

For  the  embedded  market,  HSP  processing  is  only  done  on  the  newer,  faster  RISC 
embedded  processors.  These  processors  typically  have  much  smaller  on-chip  caches  and 
much  smaller  associated  system  memories  than  desktop  processors.  Embedded 
processors  do  use  real-time  operating  systems  (RTOS),  so  they  are  typically  able  to 
balance  the  embedded  nature  of  the  operating  system  to  achieve  similar  results  to  a 
desktop  processor  for  modem  operation;  HSP  can  be  done  with  approximately  60  MHz  of 
embedded  processor  bandwidth.  However,  the  smaller  caches  and  overall  slower 
memory  of  the  system  also  usually  mean  that  running  applications  on  top  of  the  modem 
function  is  much  slower  than  such  additions  would  be  on  a  desktop  machine. 

The  remainder  of  this  paper  points  out  how  these  limitations  were  overcome  with  a 
specific  embedded  microprocessor  architecture. 
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V.  An  Embedded  Internet  Appliance  Modem  Implementation 

A  target  internet  appliance  microprocessor  architecture  is  shown  in  Figure  3  and  is 
implemented  in  the  Motorola  MPC8XX  microprocessor  family. 

Internet  appliances  can  include  PC's,  but  also  screen  phones  and  voice-over-IP 
phones,  beverage  machines,  television  browser  boxes,  automobiles,  planes,  cellular 
phones,  aircraft  carriers,  and  refrigerators.  An  integrated  device,  such  as  this,  is  typically 
used  where  a  user  interface  (display)  and  communications  are  required. 

Systems  may  have  a  requirement  to  function  as  a  bridge  between  one 
communications  channel  and  another.  For  example,  you  might  have  an  internet  phone 
that  can  act  as  a  wireline  modem  for  a  PC  connected  to  the  internet  phone  by  the  PC's 
ethemet  channel.  There  could  also  include  a  USB  connection  to  a  local  printer,  or  an 
ATM  connection  to  a  digital  set  top  box. 


Figure  2:  MPC8XX  Embedded  PowerPC™  Microprocessor  Architecture 

Implementing  a  modem  on  such  a  device  was  a  challenge,  because  one  of  the  targets 
was  to  have  a  system  which  can  be  running  an  analog  modem  and  a  browser  written  in 
Java. 

Java  is  well  known  for  its  requirement  to  have  lots  of  processing  capacity  available 
to  it  to  operate  well.  Also,  the  on-chip  RAM  availability  in  the  communications  block  is 
very  small,  with  only  8KB  of  dual  port  memory  shared  between  the  processor  and  the 
communications  processor  module  (CPM)  RISC.  Further,  the  on-chip  ROM 
(microROM)  in  the  communications  block  could  not  all  be  dedicated  to  the  modem  as  it 
holds  lots  of  routines  for  the  various  communications  channels. 

The  initial  path  in  a  solution  was  to  add  a  multiply-accumulate  block  into  the 
arithmetic  logic  unit  (ALU)  of  the  RISC  microcontroller  in  the  CPM  block.  This  MAC 
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was  to  be  used  in  conjunction  with  special  routines  in  the  microROM  to  implement  the 
modem  in  a  balance  between  the  main  microprocessor  and  the  CPM  microcontroller  with 
DSP  intensive  algorithms  performed  on  the  MAC  in  the  CPM. 

The  method  for  specifying  the  interface  to  DSP  routines  in  the  microROM  was  to 
copy  the  method  for  interfacing  to  the  communications  channels.  In  specific,  the 
communications  software  running  on  the  microprocessor  deals  with  the  communications 
channels  through  buffer  descriptor  rings.  These  buffer  descriptor  rings  point  to  data 
buffers  that  are  located  in  system  memory.  Buffer  descriptors  contain: 

•  Control  bits, 

•  Buffer  pointers,  and 

•  Length  information  and  are  8  bytes  in  length. 

Function  descriptors,  as  they  are  called  when  they  are  in  the  DSP  rings,  are  also  8 
bytes  in  length  and  contain: 

•  Control  bits, 

•  An  op-code  for  the  DSP  function  requested, 

•  Modulo  information  where  needed, 

•  The  number  of  times  the  function  is  to  be  run,  and 

•  Pointers  for  input  data,  output  data,  and  coefficients  for  the  routine. 

In  the  case  of  communications  functions,  data  buffers  can  typically  be  very  large  and 
are  usually  located  in  system  memory  while  buffer  descriptors,  which  need  to  be  accessed 
quickly  when  servicing  a  channel,  are  located  in  on-chip  RAM. 

For  DSP  functions  the  order  is  reversed:  function  descriptors  are  infrequently 
accessed.  Data  and  coefficients  are  relatively  small  and  are  accessed  at  high  speed  in  the 
inner  loops  of  DSP  processing.  Therefore  it  was  decided  that  the  DSP  data  and 
coefficients  would  be  in  on-chip  RAM  while  the  function  descriptors  would  be  in  off- 
chip  RAM. 

The  time  for  getting  the  next  function  descriptor  is  hidden  from  the  system  by  using 
the  DMA  to  fetch  the  next  descriptor  in  the  middle  of  processing  the  present  descriptor. 

Since  it  is  expected  that  the  CPM  will  be  concurrently  handling  other  channels,  like 
ethernet  and  USB,  while  taking  care  of  modem  tasks,  it  was  decided  to  write  the  DSP 
routines  as  standalone  blocks.  Thus  each  DSP  routine  goes  through  a  setup  procedure 
initializing  the  CPM  pointers  and  counters  and  then  processes  the  inner  loop  of  the 
algorithm.  When  the  inner  loop  is  done,  the  routine  is  "put  away,"  much  like  calling  an 
object  in  Java  or  C "  and  the  CPM  interrupt  controller  is  checked  for  pending  tasks. 
FIFO's  in  the  communications  channels  and  the  operating  speed  of  the  CPM  are  balanced 
to  make  sure  that  the  channels  operate  correctly  in  this  environment. 

In  many  respects  this  worked  very  well.  A  typical  CPM  DSP  routine  was  on  the 
order  of  50  instructions  so  a  small  amount  of  on-chip  ROM  could  hold  many  DSP 
routines.  Also,  a  single  routine  could  be  re-used  multiple  times  on  different  data.  The 
MAC  is  also  well  tuned  and,  in  complex  routines,  averages  very  close  to  one  MAC  per 
clock. 
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One  cloud  early  on  in  the  project,  which  ended  up  having  a  silver  lining,  was  that  the 
interrupt  load  and  synchronization  requirements  to  have  the  microprocessor  and  CPM 
microcontroller  share  tasks  back  and  forth  in  the  data  pump  was  too  high.  But  a  quick 
analysis  showed  that  was  possible  to  use  the  microprocessor  to  do  the  state  transitions  for 
the  data  pump  during  training.  This  kept  the  size  of  the  routines  required  to  be  in 
memory  on-chip  at  any  given  time  small.  There  was  also  sufficient  time  for  the 
microprocessor  to  re-load  the  CPM  memory  with  the  next  state  and  to  assist  in  training. 

Finally,  since  this  work  was  being  done  while  the  V.34  and  V.90  standards  were  in 
progress,  it  was  also  necessary  to  be  flexible  in  some  DSP  routines;  they  couldn't  all  be 
put  into  microROM. 

This  requirement  was  handled  by  the  capability  of  the  device  to  have  "microROM 
routines"  actually  function  from  the  on-chip  RAM.  This  is  called  "download  ROM." 


The  end  results  show  the  effectiveness  of  this  approach  with  the  following  statistics: 


Microprocessor  MIPS  required  in  data  mode: 

Microprocessor  MIPS  required  during  training: 
Microcontroller  peak  MIPS,  V.34  or  V.90: 

Interrupts  to  microprocessor  during  data  mode: 

Peak  on-chip  RAM  requirements  including  download  RAM, 
Coefficients,  inputs,  outputs: 

Total  MicroROM  DSP  codes  space: 

Total  function  code  list,  combined  transmit  and  receive  chain: 


less  than  10 
less  than  35 
less  than  22 
0 

less  than  5  KB 
Less  than  2K  words 
Less  than  1KB 


VI.  Summary 

This  paper  has  described  typical  DSP  and  microprocessor  implementations  of 
modems,  and  has  then  described  a  unique  implementation  of  wireline  modems  on  a 
communications  microprocessor.  Contrasted  to  HSP  implementations,  this 
implementation  leaves  the  embedded  microprocessor  virtually  free  to  handle  user  tasks 
while  the  communications  channel  is  kept  open  by  a  flexible  communications  processor. 
When  compared  to  a  pure  DSP  implementation,  the  cost  is  kept  low  by  keeping  on-chip 
memory  requirements  low  through  a  set  of  unique  implementation  characteristics. 
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Abstract.  At  Department  of  telecommunications  Faculty  of  Electrical  Engineering 
and  Information  Technology  at  the  Slovak  Technical  University  in  Bratislava  within 
the  years  1995-98  there  were  realized  activities  to  solve  the  tasks  of  private 
telecommunications  networks  and  services  as  a  part  of  the  branch  project 
101/240/1995. 

There  was  followed  Department  of  telecommunications  endeavour  in  the 
conditions  of  Faculty  of  Electrical  Engineering  and  Information  Technology  at  the 
Slovak  Technical  University  in  Bratislava  to  build  up  the  modem,  technological 
workplace  oriented  to  support  particularly  the  scientific-technological  and 
educational  activities  of  the  university,  workplaces  of  the  Slovak  Telecom,  the 
Research  Institute  of  Telecommunications  and  also  companies  which  are  concerned 
with  implementation  of  the  telecommunications  technology  into  PSTN  in  the  Slovak 
Republic. 


1  The  Pilot  Project  of  Department  of  Telecommunications  implementation  at  Faculty 
of  Electrical  Engineering  and  Information  Technology  into  the  ATM  network 

The  natural  resumption  of  the  project  is  the  effort  to  utilize  the  results  of  the  project 
successfully  in  the  activities  which  are  oriented  to  the  area  of  the  broadband 
telecommunications.  In  the  present  area  also  our  workplace  is  obliged  to  fill  important 
functions  for  the  future  (science,  research  and  education),  and  therefore  the  pilot  project  of 
building  up  the  ATM  (Asynchronous  Transfer  Mode)  node  at  the  Department  of 
Telecommunications  Faculty  of  Electrical  Engineering  and  Information  Technology  in 
Bratislava  does  not  have  other  choice  and  it  is  need  and  right  step. 

The  single  project  Private  telecommunications  networks  and  services,  as  it  was  drafted 
and  solved  at  the  Department  of  telecommunications  Faculty  of  Electrical  Engineering  and 
Information  Technology  in  Bratislava,  has  created  the  favourable  conditions  to  continue  in 
solving  of  other  tasks  of  scientific-technological  character-  especially  oriented  on  the 
broadband  telecommunications  systems,  networks  and  services. 

The  content  of  the  solving  task  was  folowing  : 
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1.  Building  up  the  ATM  node  (switch)  in  the  area  of  the  experimental  and  educational 
laboratory  at  the  Department  of  telecommunications  Faculty  of  Electrical  Engineering  and 
Information  Technology  in  Bratislava  and  realization  of  its  implementation  into  the  lap 
ATM  network  in  the  Slovak  Republic. 

2.  Realisation  of  completion  of  the  existing  ISDN  technology  (PABX  A4300L)  and 
technology  of  LAN  network  with  technical  and  programming  moduls  for  the  performance 
of  broadband  communications  in  the  connection  to  the  ATM  node. 

3.  Extension  of  the  existing  opportunities  of  videoconference  (N-ISDN)  in  the  experimental 
and  educational  laboratory  of  the  Department  of  telecommunications  Faculty  of  Electrical 
Engineering  and  Information  Technology  in  Bratislava,  with  the  following  building  of 
video-conference  workplace  up. 

4.  Along  with  point  2  realization  of  interconnection  into  the  ATM  network  of  the  Slovak 
Republic. The  present  activities  should  be  oriented  the  way  to  create  the  functional 
interconnection  to  the  Technical  University  in  Kosice  and  the  Research  Institute  of 
Telecommunications  in  Banska  Bystrica  and  to  create  conditions  for  experimental 
verification  of  long  distance  broadband  video-conference  transmission  and  fast  data 
transmission  (for  optional  applications  including  of  tele-education).  Using  of  such 
transmission  speed  for  the  mentioned  versions,  that  enable  to  realise  test  activities  to 
verify  application  services. 

5.  In  the  connection  to  the  established  ATM  node  to  realise  and  laboratory  verify 
technologies  for  Wireless  access  network  in  the  application  connection  to,  the  Research 
Institute  of  Telecommunications  (Banska  Bystrica).  Orientation  of  experimental  activities 
to  opportunities  to  apply  the  mentioned  technologies  for  the  purpose  of  flexible  realization 
of  broadband  video-conference  and  fast  data  transmission  (for  example  in  the  health 
service). 

6.  Intro  the  project  to  solve  the  procedure  of  transmission  of  other  protocols  trough  the  ATM 
(Frame  Relay-voice  and  data,  IP  protocol-internet  and  Internet  ate).  To  verify  mentioned 
activities  in  accordance  with  the  needs  and  requests  at  the  Department  of 
telecommunications  Faculty  of  Electrical  Engineering  and  Information  Technology  in 
Bratislava  in  Bratislava. 


In  the  connection  with  the  project  Private  telecommunications  network  and  services  the 
aim  of  the  planning  task  was  building  up  the  technological  and  functional  infrastructure  at  the 
Department  of  telecommunications  Faculty  of  Electrical  Engineering  and  Information 
Technology  in  Bratislava  covering  the  problems  of  broadband  networks  and  services  in  link  to 
the  B-ISDN  process. 

Results  of  such  drafted  aim  create  the  real  assumes  for  the  developing  of  cooperation  with 
the  Slovak  Telecom,  state  enterprise  for  next  years,  and  present  technical  basis  for  fulfilling  of 
the  most  difficult  tasks. 

Considering  of  the  fact  that  task  was  formulated  in  the  4th  quarter  of  the  year  1998  - 
content  and  organisation  aims  of  the  task  were  subordinated  to  the  shortage  of  time  and  real 
opportunities  to  build  up  the  technological  ATM  node  at  the  Department  of 
telecommunications  in  Bratislava  with  the  cooperation  with  the  Slovak  Telecom  and 
implementation  of  this  node  into  the  ATM  network  in  the  Slovak  Republic.  Listed  task 
presented  the  centre  of  the  Pilot  Project. 
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2  Conditions  of  realization  of  the  Pilot  Project 

The  group  of  solvers  followed  the  tendency  of  tasks  solving,  so  that  their  realization  and 
verification  were  bound  to  the  real  technical  enviroment-  to  the  broadband  ATM  network, 
activated  since  September  1998  in  conditions  of  the  Slovak  Telecom  and  wholearea  of  the 
Slovak  Republic. 

It  meant  to  realise  the  interconnection  of  telecommunication  and  computer  technologies  of 
the  Department  of  telecommunications  Faculty  of  Electrical  Engineering  and  Information 
Technology  in  Bratislava  on  the  technological  ATM  system  of  network  of  the  Slovak 
Republic  (Metropolitan  network  Bratislava). 

The  main  tasks  of  the  whole  Pilot  Project  were  divided  into  three  parts: 

1 .  Building  up  the  ATM  junction  (switch)  and  realization  its  implementation  to  the  lap  ATM 
network  in  the  Slovak  Republic  in  the  worksplace  of  experimental  and  educational 
laboratory  of  the  Department  of  telecommunications  Faculty  of  Electrical  Engineering  and 
Information  Technology  in  Bratislava. 

2.  Realization  of  completion  of  the  existing  ISDN  technology  (PABX  A4300L)  and 
technology  of  LAN  network  with  technical  and  code  modules  for  performance  of 
broadband  communication  in  connection  to  the  ATM  node. 

3.  Extension  of  the  existing  possibilities  of  video  -  conference  (N-ISDN)  in  experimental  and 
educational  laboratory  of  the  Department  of  telecommunications  Faculty  of  Electrical 
Engineering  and  Information  Technology  in  Bratislava,  with  the  followed  establishment  of 
the  video  -  conference  workplace. 

To  realise  the  needs  of  the  Pilot  Project  there  was  necessary  to  secure: 

1.  ATM  technology  (switch)-type  APEX  MAC  1  (2x155  Mbit/s)  from  the  GDS  company. 

2.  Creation  of  the  necessary  project  documentation  to  install  technology  in  the  field  of  the 
Department  of  telecommunications  Faculty  of  Electrical  Engineering  and  Information 
Technology  in  Bratislava. 

3.  Realization  of  connection  and  implementation  of  the  ATM  node  into  ATM  lap  network 
(MAN  Bratislava)  through  the  optical  fibre  with  the  transmission  speed  155  Mbit/s. 

4.  Available/needed  modules-interface  for  possible  interconnections  of  infrastructure  of  the 
Department  of  telecommunications  Faculty  of  Electrical  Engineering  and  Information 
Technology  in  Bratislava  into  the  ATM  network.  The  result  is  the  technical  support  of 
modules-interfaces  and  there  are:  4xEl,  4xCe,  4xEthernet  and  lx  155  Mbit/s. 

Detachable  architectute  of  the  technical  means  for  the  Pilot  Project  presents  the  integration 
of  LAN  network  and  branch  exchange  of  ISDN  rank  into  the  ATM  node  with  the  following 
connection  to  the  ATM  metropolitan  network  of  Bratislava. 

In  the  case  of  LAN  network  there  is  connection  of  Ethernet  at  the  level  of  10  Mbit/s. 
Previous  intention  was  to  connect  LAN  into  the  ATM  node  through  the  module  100  Mbit/s. 
The  mentioned  module  is  not  available  at  this  time. 

In  the  case  of  ISDN  private  telecommunications  network  on  the  base  of  PABX  of  Alcatel 
A  4300L  system  there  is  connection  into  the  ATM  node  through  the  El  module  with 
transmission  speed  2,048  Mbit/s  which  is  supported  by  the  menufacturer  Alcatel  company. 
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According  to  the  consultation  with  the  representants  of  the  company  ensued  that  Alcatel 
disposes  modul  for  PABX  A4300L  for  connection  on  the  base  155  Mbit/s  direct  through  the 
optical  fibre  (UNI  3.2),  there  was  offered  the  cooperation  to  verify  the  mentioned  type  of 
connection  in  the  conditions  of  the  Slovak  Republic  at  the  Dapartment  of  telecommunications 
Faculty  of  Electrical  Engineering  and  Information  Technology  in  Bratislava. 


3  Architecture  for  the  broadband  infrastructure 

To  solve  the  tasks  of  the  Pilot  Project  there  were  some  possibilities  how  to  built  up  the 
broadband  infrastructure. 

The  group  of  solvers  have  not  supported  the  tendency  to  solve  the  isolated  broadband 
structure,  but  have  oriented  the  project  to  implementation  already  existing  computer  and 
ISDN  private  telecommunication  network  into  the  ATM  enviroment. 

The  chosen  architecture  answers  a  purpose  for  more  reasons: 

1  Relatively  cheap  and  lap  functional  integration  of  LAN  network  and  ISDN  private 
telecommunications  network  into  the  ATM  enviroment. 

2.  Open  for  implementation  of  farther  technologies. 

3.  From  the  view  of  broadband  applications  presents  the  multi-purpose  enviroment. 

4.  Mainly  covers  conditions  for  the  solving  of  the  tasks  from  the  view  of  coexistence  and 
migration  of  the  technological  means  into  the  broadband  applications  (ATM  enviroment). 


3.1  Description  of  architecture  of  network  at  the  Department  of  telecommunications 

Architecture  of  the  technical  means  is  at  the  picture  1 . 

As  it  is  clear  from  the  picture,  it  is  joined  from  the  ATM  node  (Apex  MAC1)  which  is 
connected  through  the  optical  fibre  to  the  node  of  the  metropolitan  ATM  network  Bratislava 
(Nabelkova  ulica)  and  there  is  used  up  the  accessing  optical  port  155  Mbit/s. 

The  present  optical  connection  was  enabled  by  connecting  to  already  built  optical 
contact  (20  fibres  connection)  of  the  Department  of  telecommunications  Faculty  of  Electrical 
Engineering  and  Information  Technology  in  Bratislava  that  is  a  part  of  urban  optical  SDH 
network  (circuit  2-level  STM1).  There  is  necessary  to  emphasize  that  the  connection  to  the 
PSTN  through  optical  network  SDH  at  present  is  used  by  the  ISDN  private 
telecommunication  network  -  A4300L  (2x  PRI  ISDN),  and  the  interconnection  is  directed  to 
the  connecting  system  EWSD  in  Petrzalka  and  S12  on  the  SNP  square. 

The  ATM  unit  MAC1  provides  for  the  internal  connection  opticalport  with  the 
accessing  speed  155  Mbit/s.  For  solution  of  the  Pilot  Project  we  do  not  count  with  its 

utilization  in  the  first  phase.  . 

The  ATM  node  was  mounted  by  followed  modules-interfaces  to  connect  the  internal 

infrastructure  of  the  Department  of  telecommunications  : 

1 .  Module  4  x  port  Ce  (n  x  64  kbit/s), 

2.  Module  4  x  port  El  (G.703), 

3 .  Module  4  x  Ethernet  ( 1 0  Mbit/s). 
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For  the  purposes  of  the  project  there  was  realised  the  connection  of  LAN  network  into  the 
place  of  disloccation  of  the  ATM  node  with  metalic  technic  (termination  RJ  45).  Connection 
was  realised  to  the  ATM  node  from  port  Ethernet  (10  Mbit/s). 

Farthermore  there  was  realised  the  conection  ISDN  branch  exchange  Alcatel  A4300L  into 
the  place  of  disloccation  of  the  ATM  node  by  the  metalic  technic  (termination  BNC). 
Connection  was  realised  on  ATM  note  to  El  port  (G.703-2,048  Mbit/s).  On  this  purpose  was 
PABX  supplemented  by  the  module  for  communication  through  El  port  (G.703-2,048 
Mbit/s).  There  is  a  need  to  emphasize  and  the  picture  1  documents  that  the  ISDN  connections 
on  the  level  PRI  (2  x  2,048  Mbit/s)  between  A4300L  and  PSTN  stay  farther  functional  and 
this  will  be  used  up  in  the  future  to  solve  tasks  of  cooperation  N  and  B  ISDN  enviroment 
(technical  and  functional). 

For  the  purposes  of  the  project  were  verified  by  experiments  the  facilities  for  realization  of 
ISDN  teleservice  Videotelephones  and  on  the  interface  SO  (BA  ISDN)  in  the  experimental  and 
research  laboratory.  Equipment  for  experiment  we  have  received  from  Alcatel  company  (2x 
PictureTel)  and  Siemens  (2x  IVIEW). 

In  both  cases  the  equipments  realise  the  service  to  support  signalling  DSSl(Euro  ISDN) 
and  according  to  experiments  carried  out  on  PABX  ISDN  Alcatel  4300L  can  work  with  the 
utilization  of  digital  chanel  64  kbit/s  or  2  x  64  kbit/s,  so  with  the  utilization  of  the  ISDN 
chanel  structure  BA  ISDN  (2B+D). 

Farther  extension  of  possibilities  of  the  existing  videoconferences  on  the  base  N-ISDN  in 
the  experimental  and  educational  laboratory  of  the  Department  of  telecommunications  Faculty 
of  Electrical  Engineering  and  Information  Technology  in  Bratislava  will  be  done  by 
experimental  loading  of  the  videoconference  systems  VIDEO  ALCATEL  3276  (3x  BA 
ISDN). 

Mentioned  system  presents  relatively  cheap  solution  for  the  cration  of  the  videoconference 
workplace  (place)  without  necessity  to  create  the  chanel  of  H  type  in  ISDN  network. 


3.2  Description  of  ATM  note  implementation  into  ATM  network  in  the  Slovak 
Republic 

The  effort  of  the  Pilot  Project  was  not  to  create  the  isolated  ATM  node.  Creating  of  the 
node  there  are  followed  possibilities  of  connection  into  the  whole  Slovak  wideband  network, 
which  is  serviced  by  the  Slovak  Telecommunications,  state  enterprise.  Essentialy  there  is  the 
aim  to  create  some  assumptions  for  farther  experimental  and  research  projects  and 
experiments  for  the  technological,  network  and  functional  area.  There  is  natural  effort  to 
realise  and  verify  the  assumptions  of  the  projects  presenting  the  possibilities  for  the 
application  of  broadband  services  into  the  society  infrastructure. 

In  the  mentioned  aspects  the  Pilot  Project  fulfils  these  expectations  because  by  optical 
connection  to  the  node-Nabelkova  ulica-with  the  accessing  speed  155  Mbit/s  there  has  been 
connected  the  ATM  node  at  Department  of  telecommunications  Faculty  of  Electrical 
Engineering  and  Information  Technology  in  Bratislava  into  the  metropolitan  network  MAN 
Bratislava. 

At  this  place  there  is  necessary,  after  considering  of  the  situation  of  the  Pilot  project,  to 
say  that  the  aim  is  not  only  connection  into  ATM  network  of  the  Slovak  Republic  already  in 
the  first  phase,  but  also  the  solving  of  wideband  connection  among: 
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1.  Department  of  telecommunications  Faculty  of  Electrical  Engineering  and  Information 
Technology  in  Bratislava  and  the  Technical  University  in  Kosice  (purpose,  fast  data 
transmission,  videoconference,  cooperation  on  the  verification  of  the  broadband 
applications,...). 

2.  Department  of  telecommunications  Faculty  of  Electrical  Engineering  and  Information 
Technology  in  Bratislava  and  the  Research  Institute  of  Telecommunications  in  Banska 
Bystrica  (purpose:  fast  data  transmission,  verification  ISDN  PABX  connection  from  the 
view  of  transparency  of  ISDN  services,  verification  of  the  possibilities  of  using  up 
wireless  broadband  units  and  their  connection  through  the  ATM  network  on  the  purpose 
of  videoconference  in  the  health  service  facilities,...). 

In  the  first  phase  there  is  functional  connection  of  three  nodes  through  the  WAN  network 
and  verification  of  the  possibility  for  broadband  communication.  Single  picture  2  marks  only 
interesting  points  and  does  not  show  complete  topology  of  the  ATM  network  in  the  Slovak 
Republic. 

According  to  information  from  the  Slovak  Telecom,  state  enterprise  there  will  be  possible 
to  assume  the  use  of  transmission  speed  to  20  Mbit/s  for  the  purposes  of  the  experimental 
work  for  the  year  1999. 


4  Perspective  of  utilization  of  the  Pilot  Project 

There  might  be  distinguishsd  tasks  of  the  nearest  year  1999  but  also  tasks  that  continue  the 
present  year. 

In  the  range  of  preparations  of  the  Pilot  Project  there  were  realised  many  meetings  and 
conferences  not  only  with  the  representants  of  the  Slovak  Telecom,  state  enterprise,  and  the 
Research  Institute  of  Telecommunications  Banska  Bystrica,  but  also  with  the  representants  of 
the  foreign  companies  in  our  country  (Alcatel,  Siemens).  Impulses  from  the  meetings  will  be 
included  in  the  formulating  of  the  new  tasks  of  scientific  and  technical  trends  for  the  next 
period. 

On  the  other  hand  the  realization  of  the  mentioned  three  tasks  we  prepare  for  the  first  half 
of  the  year  1999.  Their  successful  termination  is  for  us  significant.  There  are  followed  tasks: 

1.  Realization  of  the  broadband  transmission  of  data  among  the  workplaces  the  Research 
Institute  of  Telecommunications  Banska  Bystrica,  TU  Kosice  and  the  Department  of 
telecommunications  Faculty  of  Electrical  Engineering  and  Information  Technology  in 
Bratislava. 

2.  Realization  of  the  videoconference  transmission  among  the  workplaces  TU  Kosice,  the 
Research  Institute  of  Telecommunications  Banska  Bystrica  and  the  Department  of 
telecommunications  Faculty  of  Electrical  Engineering  and  Information  Technology  in 
Bratislava,  with  the  verification  of  its  possibilities. 

3.  Realization  of  the  broadband  videoconference  transmission  with  the  application  into  the 
health  service  system  between  the  Rooswelt  Hospital  in  Banska  Bystrica  and  the  Hospital 
on  Kramare  in  Bratislava.  To  solve  the  videoconference  there  should  be  used  the  wireless 
wide  bound  units.  The  base  stations  should  be  situated  in  the  Research  Institute  of 
Telecommunications  Banska  Bystrica  and  at  the  Department  of  telecommunications 
Faculty  of  Electrical  Engineering  and  Information  Technology  in  Bratislava  and  the 
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remote  transmission  will  be  realised  through  the  ATM  network.  The  mentioned  situation 

as  it  is  drawn  will  be  presented  during  the  conference. 

There  is  a  high  assumption  that  the  conditions  for  the  realization  will  be  successfully 
verified  in  the  short  term  and  the  following  experiments  can  be  realised. 

The  single  task  in  the  point  3  opens  the  opportunities  for  our  workplaces  to  participate  on 
the  European  projects  of  Information  Society  namely  for  the  area  of  the  Health  Service  system 
(TEMEP  Telemedicine  Services  Delivered  to  the  Point  of  Need).  The  project  presents  the 
extremly  meningful  application  of  videoconference  in  the  human  area. 

For  the  presented  aim  of  the  project  there  has  been  made  an  agreement  between  the 
Research  Institute  of  Telecommunications  Banska  Bystrica  and  the  Department  of 
telecommunications  about  the  common  process  of  solving  of  conditions  to  achieve  the 
experiment  in  the  presented  area.  Among  the  chosen  workplaces  there  belong  Rooswelt 
Hospital  in  Banska  Bystrica  and  the  Cardiovascular  Centre  in  the  Hospital  in  Kramare  in 
Bratislava. 

By  itself  realization  of  the  videoconference  in  the  represented  application  according  to  the 
last  specifications  there  is  needed  the  chanel  with  the  transmission  speed  20  Mbit/s  (Siemens 
+  NewBridge...). 

Conception  of  the  prepared  project-application,  does  not  remain  strictly  on  the  solving  of 
technical  connection  into  the  accessing  network  only  through  the  metalic  and  optical  sphere. 
Project  goes  on  and  has  ambition  to  solve  the  interconnection  between  transport  bands  and 
application  place  with  the  help  of  broadband  wireless  units  of  accessing  network. 

Naturally  the  presented  questions  exceed  the  range  of  this  project,  therefore  we  will  not 
specify  the  technical  and  functional  conditions  at  this  place.  We  can  only  remark  that 
application  of  the  broadband  videoconference  without  the  need  of  fixed  broadband  connection 
into  the  accsessing  network  (metalics,  optic),  multiplies  the  opportunities  for  the  various 
application  and  definitely  we  see  great  perspective  in  such  oriented  project. 


5  Conclusions 

The  Department  of  telecommunications  Faculty  of  Electrical  Engineering  and  Information 
Technology  in  Bratislava  as  the  educational  institution  understands  the  need  and  takes  the 
initiative  in  some  process  that  influences  successful  implementation  of  the  modern 
telecommunication  services  into  the  life  of  society. 

The  workplace  in  the  presented  area  has  solved  a  lot  of  important  research  tasks  and 
scientific  projets  in  last  period. 

At  the  Department  of  telecommunications  we  have  realised  and  installed  in  very  short 
time  the  ISDN  private  telecommunication  network  and  namely  even  in  advance  before  the 
society  practice  and  we  have  also  realised  from  the  view  of  telecomuni  cation  system  unique 
hybrid  system  (cooperation  of  signal  systems). 

In  the  nearest  future  we  want  to  orientate  to  the  solving  of  already  presented  application  of 
broadband  services  with  our  famous  partners  and  also  to  work  on  the  preparation  and 
definitions  of  the  tasks  connected  with  the  cooperation  N  and  B  ISDN  means,  solving  of  the 
migration  of  the  telecommunication  technologies  into  the  broadband  enviroment 
(technologies,  compatibility  and  transparency  of  services,  quality  of  provided  services, 
supervision  of  broadband  network  and  so  on). 
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Inspite  of  these  tasks  are  waiting  for  solution,  theoretical  and  practical  preparation  is  the 
best  way  to  manage  them.  Therefore  we  emphasize  this  Pilot  Project  so  much  as  the  mean  to 
start  theoretical  and  practical  activities  in  the  B-ISDN  enviroment. 
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Abstract.  This  paper  presents  an  overview  of  microelectromechanical 
systems  (MEMS)  research  topics.  Manufacturing  technologies,  includ¬ 
ing  bulk  and  surface  micromachining,  are  illustrated  using  fabricated 
devices.  Optimal  exploitation  of  these  technologies  relies  on  the  inte¬ 
gration  in  a  single  chip  of  MEMS  parts  with  their  electronic  interfaces. 

Current  challenges  for  bringing  the  analysis,  design  and  test  of  these  new 
generation  of  devices  in  line  with  microelectronics  are  then  discussed. 

I.  Introduction 

Manufacturing  batch  processes  initially  developed  for  IC  fabrication  are  at  the  heart  of 
the  rapidly  growing  MEMS  technologies.  This  is  because  the  principles  governing  the 
manufacturing  of  MEMS  are  an  evolution  of  microelectronics.  Superposition  of  different 
thin  film  layers  is  used  for  implementation.  In  addition  to  electrical  phenomena,  MEMS 
exploit  other  phenomena  occurring  in  these  layers  in  the  mechanical,  thermal,  chemical, 
radiant  or  magnetic  domains. 

At  the  current  stage  of  MEMS  technology  development,  much  work  is  still  carried  out 
within  research  laboratories.  The  two  primary  classes  of  MEMS  devices  are  sensors  and 
actuators,  but  a  wide  range  of  micromachines  have  already  been  demonstrated.  Transfer 
to  industry  is  increasingly  taking  place  in  applications  such  as  accelerometers,  pressure 
sensors,  chemical  and  flow  sensors,  or  microoptics.  Optimal  exploitation  relies  on  the 
integration  in  a  single  chip  of  MEMS  parts  with  their  electronics  interfaces,  which  provides 
scope  for  very  low  cost  production  and  significant  improvement  in  system  reliability. 

As  technological  hurdles  are  removed  and  stable  fabrication  processes  emerge,  re¬ 
search  efforts  shift  towards  the  design  of  systems  of  increasing  complexity  where  limita¬ 
tions  are  only  those  stemming  from  the  imagination  of  researchers  and  engineers.  Before 
this  advanced  stage  can  be  reached,  as  for  microelectronics,  new  CAD  methodologies  and 
tools  need  to  be  developed  for  MEMS.  MEMS  design  is  still  a  very  complex  process,  which 
is  not  approached  in  the  analytical  hierarchical  style  of  microelectronics.  Computational 
methods  at  a  very  low  level  of  abstraction  are  necessary,  and  the  lack  of  adequate  higher 
levels  of  abstraction  makes  very  difficult  a  systematic  approach  to  the  design  of  larger 
systems.  In  addition,  as  these  systems  grow  in  size  and  the  levels  of  integration  increase, 
testing  becomes  a  major  barrier. 

This  paper  provides  an  overview  of  MEMS  research  topics.  After  introducing  the 
most  basic  technologies,  current  challenges  for  bringing  the  design  and  test  of  MEMS 
parts  in  line  with  microelectronics  are  discussed. 
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II.  Fabrication  methods 

The  most  common  processing  techniques  for  microelectromechanical  systems  include  bulk 
and  surface  micromachining. 

CMOS-compatible  bulk  micromachining  provides  a  low  cost  monolithic  solution  for 
the  integration  of  MEMS  [1].  A  commercial  CMOS  process  is  followed  by  a  selective 
anisotropic  etching  post-process.  Silicon  etching  takes  place  over  some  preferential  cnstal- 
lographic  planes,  which  have  a  much  faster  etching  rate  than  other  planes.  The  selective 
removal  of  silicon  etches  wells  in  the  substrate,  giving  place  to  membranes,  cavities, 
masses  and  bridges  which  are  basic  MEMS  components  for  combination  with  micro¬ 
electronics.  For  example,  suspended  parts  such  as  bridges  and  membranes  have  a  good 
thermal  isolation  from  the  bulk.  This  is  typically  exploited  in  devices  such  as  infrared 
sensors,  electrothermal  converters  and  thermal  pixels.  Bulk  micromachining  can  take 
place  from  the  front  side,  back  side  or  both. 

Front-side  bulk  micromachining  has  often  been  used  by  CMP  since  allows  low  cost 
maskless  silicon  etching.  A  cross  section  of  a  bulk  micromachined  suspended  structure  is 
shown  in  Figure  1(a).  During  layout,  areas  of  naked  silicon  exposed  for  micromachining 
are  created  by  stacking  a  contact,  a  via  and  an  open  in  the  passivation.  Suspended 
structures  are  made  of  a  sandwich  of  oxides  (LOCOS,  gate  and  interlevel  oxides),  nitride 
passivation,  polysilicon  and  metal  levels.  Anisotropic  etching  takes  place  in  areas  of 
exposed  silicon,  creating  a  cavity  with  the  shape  of  an  inverted  pyramid.  The  etchants 
used  include  EDP,  KOH  and  TMAH.  EDP  has  the  advantage  that  does  not  significantly 
attack  aluminum,  and  it  does  not  attack  passivation  layers,  although  is  highly  toxic. 
KOH  allows  very  clean  surfaces  and  etching  plans,  but  has  the  disadvantage  that  it 
attacks  aluminum.  TMAH  has  the  fastest  etching  rate,  and  does  not  attack  aluminum 
pads  when  silicon  or  silicic  acid  are  adequately  dissolved  in  the  solution,  but  pyramidal 
protuberances  (hillocks)  may  appear  at  the  bottom  of  the  cavities  created. 

The  Electro-Thermal  Converter  (ETC)  shown  in  Figure  2(a)  has  been  fabricated  by 
means  of  this  process.  ETC  devices  are  typically  used  as  true  rms  converters,  transferring 
the  root-mean-square  value  of  an  AC  voltage  or  current  to  its  equivalent  DC  value.  The 
schematic  diagram  of  Figure  2(b)  shows  the  ETC  (without  interface  electronics)  composed 
of  a  cantilever  beam  which  supports  a  heating  resistor  at  its  end  and  a  thermopile.  The 
resistor  transforms  the  electrical  input  into  heat  which  flows  by  conduction  through  the 
beam,  by  convection  into  the  air  surrounding  the  beam,  and  by  radiation.  The  fact  that 
the  beam  is  suspended  leads  to  an  increase  in  thermal  resistance  between  the  resistor 


Fig.  1.  MEMS  technologies:  (a)  CMOS-compatible  silicon-bulk  micromachinining,  and 
(b)  three-poly  surface  micromachining  [1]. 
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Fig.  2.  MEMS  examples:  (a)  SEM  of  CMOS-compatible  silicon-bulk  micromachined 
electro-thermal  converter,  (b)  schematics  of  the  ETC,  (c)  surface  micromachined  300 
kHz  microresonator,  and  (d)  schematics  of  the  microresonator. 


and  the  substrate  (which  acts  as  a  heat  sink  and  corresponds  to  thermal  ground).  This 
increase  in  thermal  resistance  results  in  higher  beam  temperatures.  The  thermopile, 
which  senses  the  temperature  gradient  and  produces  an  output  voltage,  is  made  of  a 
set  of  thermocouples  connected  in  series.  Each  thermocouple  is  made  of  a  couple  of  n- 
and  p-type  polysilicon,  which  have  different  Seebeck  coefficients  a.  One  side  of  each 
thermocouple  is  at  thermal  ground  (Tcold)  and  the  other  side  is  near  the  heating  resistor 
(Thot),  so  that  an  electrical  voltage  is  generated  between  both  ends  of  each  thermocouple 
when  a  temperature  gradient  AT  =  (T^ot  —  Tcojd)  exists.  The  total  voltage  at  the 
thermopile  output  is  the  number  of  thermocouples  times  the  voltage  through  one  of 
them. 

Surface  micromachining  technologies  are  having  increasing  success  in  the  fabrication 
of  complex  MEMS.  Typically,  microaccelerometers  based  on  sensing  capacitance  changes 
and  microfilters  can  be  produced.  Devices  including  many  suspended  elements,  such  as 
electrostatic  comb-drives  and  microgears  for  microengines,  are  also  appearing  together 
with  libraries  of  suspended  elements  and  adequate  structured  design  methodologies  [2]. 
Surface  micromachining  is  based  on  the  deposition  of  thin  films  on  the  surface  of  the 
wafer.  The  micromechanical  layer  is  normally  polysilicon  or  nitride.  Polysilicon  is  a 
material  frequently  used  for  both  electronics  and  sensors.  Silicon  nitride  is  the  second 
main  material  used  in  mechanical  layers  since  is  mechanically  strong  with  low  optical 
absorption.  The  stress  level  in  both  materials  is  highly  dependent  upon  both  deposition 
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parameters  and  subsequent  thermal  processing.  A  sacrificial  layer  of  a  material  such  as 
silicon  oxide,  polysilicon,  porous  silicon  or  aluminum  is  also  deposited.  A  postprocessing 
operation  removes  this  sacrificial  layer  to  suspend  the  micromechanical  part.  Various 
types  of  oxides  including  thermal  oxide,  LPCVD  (LTO,  PSG,  BPSG)  and  PECVD  can  be 
used  having  each  advantages  and  disadvantages  in  terms  of  quality,  etch  rate  or  thickness 

uniformity.  #  n 

Figure  1(b)  shows  a  cross  section  of  a  microstructure  fabricated  via  the  MONO 

Multi-User  MEMS  Process  (MUMPs)  [3].  This  is  a  MEMS  specific  technology,  using 
three  polysilicon  levels:  Poly  0  on  nitride  for  electrodes  and  interconnections,  and  thick 
Poly  1  and  Poly  2  for  structural  layers.  A  metal  layer  (on  Poly  2)  is  used  for  optical  and 
electrical  purposes.  Contact  cuts  in  the  phosphosilicate  glass  (PSG)  layer  allow  for  the 
formation  of  mechanical  anchor  points,  which  fix  a  microstructure  to  the  silicon  substrate. 
The  microstructures  are  suspended  by  immersion  of  the  chip  in  a  bath  of  HF  (hydrofluo¬ 
ric  acid).  This  wet  chemical  etch  removes  the  sacrificial  PSG  layers  that  encapsulate  the 
movable  parts.  In  Figure  1(b),  the  Poly  2  structural  layer  will  be  released  after  remov¬ 
ing  the  sacrificial  oxides  underneath.  This  technology  allows  many  applications  such  as 
accelerometers,  micromechanical  resonators  or  electrostatic  micromotors. 

Figure  2(c)  shows  an  example  MEMS  resonator.  MEMS  resonators  are  being  pro¬ 
posed  for  highly  selective  micromechanical  filtering  for  wireless  communications  and  high- 
Q  oscillators  [4],  As  shown  in  the  schematics  of  Figure  2(d),  the  resonator  is  a  mechanical 
mass-spring-damper  system  consisting  of  a  central  shuttle  mass  that  is  suspended  by  two 
folded-beam  flexures.  The  topology  of  the  suspension  is  designed  to  be  compliant  in  the  x 
direction  (direction  of  motion),  and  to  stiffen  against  y-direction  and  torsional  movement 
to  keep  the  fingers  of  the  comb-drive  transducers  aligned.  The  comb-drive  transducers 
(interdigitated  finger  structures)  are  used  for  exciting  and  sensing  vibration  parallel  to 
the  plane  of  the  substrate  [5].  They  are  DC  biased  with  a  voltage  Vp  applied  to  the  shuttle 
mass  via  the  anchor  points  of  the  suspension  (three  pads  can  be  seen  in  the  resonator  of 
Figure  2(c)  for  input,  output  and  DC  bias).  An  electrostatic  driving  force  is  generated 
by  the  input  voltage,  which  can  make  the  resonator  vibrate  when  the  input  frequency  is 
close  to  the  resonance  frequency  of  the  suspended  microstructure.  The  vibrations  of  the 
central  mass  generate  a  current  in  the  output  comb-drive  capacitive  transducer. 

III.  Design  methodology 

In  general,  MEMS  design  is  a  complex  process  which  is  not  approached  in  the  traditional 
hierarchical  and  analytical  style  of  microelectronics.  It  often  requires  solving  for  strongly 
coupled  non-linear  partial  differential  equations.  Thus,  MEMS  devices  are  often  designed 
at  a  low  level  of  abstraction  using  exact  computational  methods  or  at  a  high  level  of 
abstraction  using  signal  flow  analysis.  On  one  hand,  signal  flow  analysis  does  not  provide 
a  direct  linkage  between  physical  layout  and  behavioral  simulation  due  to  the  high  level 
of  abstraction.  On  the  other  hand,  computational  techniques  such  as  FEM  are  very 
general  but  they  are  arduous  and  time  consuming  for  system  design,  due  to  the  low  level 
of  abstraction  and  the  lack  of  design  hierarchy. 

FEM  models  are  difficult  to  construct  and  simulation  is  very  time  consuming.  In  the 
case  of  the  ETC  converter,  a  coupled  FEM  electrothermal  analysis  gives  back  a  temper¬ 
ature  map  of  the  cantilever  as  shown  in  Figure  3(a).  The  global  geometry  of  the  ETC  is 
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represented  by  a  set  of  nodes  coupled  with  their  neighbors,  each  node  having  in  general 
several  degrees  of  freedom.  Values  of  thermal  capacitance  and  thermal  conductances  axe 
required  for  the  beam  material.  Boundary  conditions  are  imposed  on  substrate  nodes 
which  must  be  at  thermal  ground.  The  loads  applied  to  the  system  include  the  voltage 
difference  at  the  resistor  inputs  and  heat  convection  on  surfaces  with  air  contact.  For  the 
microresonator,  a  coupled  electromechanical  FEM  analysis  is  very  difficult  to  perform. 
Mechanical  FEM  simulations  can  be  performed  to  check  for  stress  and  deformations  in 
the  microstructure  shuttle  and  the  supporting  folded  beams.  The  beams  must  be  suffi¬ 
ciently  wide  in  order  to  support  a  maximum  stress  in  the  regions  indicated  in  Figure  3(b), 
given  that  their  thickness  is  fixed  by  the  MUMPS  fabrication  process. 


(b) 


Fig.  3.  FEM  analysis:  (a)  temperature  map  of  the  ETC  cantilever  for  0.5V  DC  input, 
and  (b)  stress  analysis  for  the  microresonator. 


Moving  to  intermediate  levels  of  abstraction  can  put  in  phase  MEMS  component 
modeling  with  VLSI  component  modeling,  leveraging  similar  research  and  development 
such  as  circuit  simulation.  This  is  possible  for  some  classes  of  MEMS  devices  accurately 
described  by  ordinary  differential  equations,  and  represented  as  networks  of  lumped  pa¬ 
rameter  elements.  This  will  be  clearer  below  for  the  electromechanical  and  electrothermal 
devices  above  described.  The  networks  describing  these  systems  obey  to  basic  energy 
conservation  laws,  such  as  the  summation  of  accross  quantities  (voltage,  displacement 
or  temperature  differentials)  around  a  closed  loop  equalling  zero  and  the  summation  of 
through  quantities  (current,  force  or  heat)  at  any  network  node  equalling  zero. 

Structured  design  methodologies  can  be  based  on  an  underlying  A-HDL  language 
that  provides  constructs  for  defining  sets  of  simultaneous  ordinary  differential  equa¬ 
tions  [6].  This  approach  is  important  for  efficiency  and  compatibility  with  standard  IC 
design  [2],  providing  a  linkage  between  physical  layout  and  behavior  which  is  also  required 
for  realistic  fault  simulation  and  testing.  Such  methodology,  which  can  be  made  possible 
by  the  emergence  of  stable  MEMS  fabrication  processes,  must  include  CAD  tools  such 
as  libraries  of  characterized  MEMS  elements,  mixed-technology  simulators,  anysotropic 
etching  simulators,  layout  synthesis  tools  or  design  rule  checking  tools.  Research  efforts 
can  then  shift  towards  the  design  of  systems  of  increasing  complexity,  focusing  on  higher 
level  design  issues.  Figure  4  illustrates  the  place  of  the  circuit-level  design  approach 
within  an  overall  design  methodology  for  microsystems. 

For  the  ETC  converter,  an  equivalent  circuit-level  description  is  given  in  Figure  5(a). 
The  model  contains  four  electric  pins  which  correspond  to  the  inputs  of  the  heating  re- 
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modeling  and  simulation  due  to  die  lack  of  design  hierarchy 


Traditional  microsystem  design 
EFTj  Design  approach  under  research 


Fig.  4.  Levels  of  abstraction  in  the  microsystem  design  flow. 

sistor  and  outputs  of  the  thermopile.  The  heat  transfer  in  the  microstructure  is  modeled 
by  means  of  an  one-dimensional  heat  transmission  line,  as  suggested  by  the  FEM  results 
of  Figure  2(a).  The  heating  power  generated  in  the  input  resistor  is  modeled  as  a  heat 
source  coupled  to  the  transmission  line,  where  each  thermal  element  models  the  ther¬ 
mal  conductances  and  thermal  capacitance  of  one  beam  portion.  Thermal  conduction, 
convection  and  radiation  losses  are  modeled  by  means  of  equivalent  thermal  resistors. 
The  substrate  of  the  microstructure  is  treated  as  a  heat  sink  at  room  temperature  and 
corresponds  to  electrical  ground.  Figure  5(d)  shows  the  simulation  of  the  ETC  when  a 
pulse  excitation  of  IV  of  amplitude  and  25  ms  of  width  is  applied  as  input.  Each  curve 
shows  the  evolution  in  time  of  the  temperature  (-K)  in  different  nodes  of  the  beam,  the 
higher  the  curve  the  closer  the  node  to  the  input  resistor. 

Circuit-level  simulation  of  the  microresonator  involves  decomposing  the  device  into 
components  having  electrical  and  mechanical  interfaces.  Each  component  contains  nodes 
for  coupling  with  other  components.  Circuit  simulation  solves  for  system  across  variables 
in  each  node  (for  example,  position  x,y,  angle  9  and  voltage)  by  making  the  sum  of  through 
variables  in  each  node  (forces  in  x  and  y  directions,  moment  about  9  and  current)  equal  to 
zero.  Component  models  which  relate  multi-domain  through  variables  in  terms  of  across 
variables  can  be  built  with  an  A-HDL  language,  and  structured  design  schematics  can 
be  captured.  A  first  approach  is  shown  in  Figure  5(b).  Components  include  two  comb- 
drive  transducers,  one  mass-spring-damper  system,  and  a  transimpedance  amplifier  for 
current- to- volt  age  conversion.  A  simple  one-dimensional  (x- translation)  HDL-A  model 
for  the  comb-drive  component  is  used.  The  results  of  AC  simulation  for  the  mechanical 
displacement  of  the  shuttle  and  the  output  voltage  are  shown  in  Figure  5(e)  illustrating 
the  band-pass  filtering  behavior.  A  more  refined  circuit-level  approach  requires  the  anal¬ 
ysis  of  the  interactions  between  basic  components  such  as  beam  flexures,  electrostatic 
gaps,  plate  masses  and  anchors.  As  an  example,  Figure  5(c)  shows  the  schematic  capture 
of  a  microresonator  mass-spring-damper  system  for  circuit-level  analysis.  A  beam  flex¬ 
ure,  for  example,  is  represented  as  an  element  having  two  nodes  with  three  mechanical 
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(d)  (e) 

Fig.  5.  MEMS  HDL-based  simulation:  (a)  thermal  circuit  for  the  ETC  cantilever,  (b) 
system-level  description  of  the  microresonator,  (c)  mass-spring-damper  system  for  circuit- 
level  mechanical  analysis,  (d)  circuit-level  simulation  of  the  ETC  with  an  input  IV  25ms 
pulse,  and  (e)  system-level  simulation  of  the  //resonator  with  100  mV  input. 

across  variables  in  each  node  (x,  y  and  9).  Since  elements  directly  linked  to  the  layout 
are  now  available,  the  injection  of  the  most  realistic  faults  is  possible.  For  example,  faults 
resulting  from  the  break  of  a  beam  or  an  anchor  point  or  the  stiction  of  a  beam  can  be 
readily  injected  in  such  a  circuit. 

IV.  MEMS  Testing 

MEMS  devices  fabricated  today  are  tested  functionally.  Being  essentially  analog  devices, 
their  test  is  being  approached  by  means  of  techniques  from  the  analog  test  domain.  In 
general,  a  MEMS  device  must  be  tested  as  an  entity  after  manufacturing,  with  all  modules 
in  interaction  and  after  packaging.  For  instance,  additional  stresses  due  to  packaging 
can  significantly  impact  device  behavior.  Since  encapsulation  of  MEMS  is  often  a  very 
critical  issue,  accounting  sometimes  for  even  over  80%  of  the  overall  cost,  testing  a  device 
should  also  be  considered  before  packaging,  screening  out  defective  devices  as  early  as 
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possible.  Functional  testing  of  large  volume  chips  embedding  MEMS  risks  to  be  extremely 
expensive.  As  for  purely  microelectronic  chips,  the  development  of  cost-effective  tests  for 
large  volume  chips  embedding  MEMS  may  well  require  test  stimuli  targeting  actual  faults, 
developing  fault  lists  and  fault  models  for  realistic  defects,  and  using  fault  simulation  as  a 
major  approach  for  assessing  testability  and  dependability.  This  is  illustrated  in  Table  I. 


Tab.  I.  Testing  of  integrated  systems 


Integrated  Systems 
Testing 

Digital 

Circuits 

1980s-... 

Analog  and  Mixed- 
Signal  Circuits 
1990s-... 

Microsystems 

2000s-... 

failure  mechanisms 
failure  modes 
fabrication  defects 

gate-oxide  breakdown, 
electrical  overstress,  contaminants, 
latchup,  ... 

...  and  in  addition  : 
//machining  defects, 
fatigue,  friction  ... 

fault  models 

stuck-at, 
stuck-on, 
sutck-open, 
bridge  ... 

parametric  faults 
catastrophic  faults 
(electrical 
shorts  &  opens) 

...  and  in  addition  : 
shorts  h  opens  in  the 
thermal,  mechanical  ... 
domains 

test  techniques 

fault  simulation 
(concurrent,  VHDL, 
HSPICE), 

ATPG  (PODEM,  ...) 

diagnostic  (fault, 
dictionary,  boolean) 
BIST  (scan  path, 
LFSR,  signature  ...) 

fault  simulation 
(sequential,  HSPICE, 
SABER,  VHDL-AMS) 
ATPG  (sensitivity  ...) 

diagnostic  (e.g. 
frequency  signature) 
BIST  (no  general 
solution) 

modeling  of  //systems 
at  circuit-level  (HDLs) 

& 

transposition  of 
techniques  developed 
for 

microelectronics 

The  distinction  between  parametric  and  catastrophic  faults  typical  of  analog  elec¬ 
tronics  testing  appears  again  for  MEMS.  In  MEMS  parts,  parametric  faults  due  to  devia¬ 
tions  of  electrical,  mechanical  or  thermal  parameters  seem  to  dominate  over  catastrophic 
faults.  For  example,  faults  caused  by  defects  originating  from  particulate  contaminations, 
of  much  relevance  in  microelectronic  parts  are  of  lesser  concern  for  micromechanical  parts, 
at  least  in  fabrication  steps  previous  to  dicing  and  packaging.  On  the  other  hand,  new 
technological  steps  such  as  silicon  micromachining  introduce  new  types  of  defects  and 
failure  mechanisms  specific  to  MEMS  parts. 

Figure  6  shows  some  example  bulk  and  surface  micromachining  defects.  Typical  de¬ 
fects  occurring  durant  bulk  micromachining  include  an  inadequate  release  of  a  suspended 
microstructure.  This  can  be  due  to  the  presence  of  oxide  residuals  which  prevent  etch¬ 
ing,  insufficient  etching  time,  slow  etching  rate  because  of  an  inadequate  solution  (e.g. 
formation  of  hillocks),  or  re-depositions  of  etched  material  which  may  occur  after  micro¬ 
machining  [7].  Figure  6(a)  shows  an  ETC  cantilever  which  has  not  been  fully  suspended 
due  to  insufficient  etching  time.  It  is  possible  to  see  through  the  silicon  dioxide  beam, 
semi-transparent  to  the  electronic  microscope,  the  triangular  shape  of  the  silicon  mate¬ 
rial  under  the  beam  which  has  not  yet  been  removed.  This  defect  results  in  an  incorrect 
cantilever  temperature  map.  Figure  6(b)  shows  the  formation  of  hillocks  (pyramidal  pro¬ 
tuberances)  at  the  bottom  of  a  microcavity.  Hillocks  reduce  the  etching  rate,  and  the 
microbridge  of  Figure  6(b)  is  not  fully  freed  in  its  central  part. 

The  largest  impact  on  the  yield  of  surface  micromachining  technologies  has  by  far 
been  dominated  by  stiction.  Stiction  can  mostly  occur  when  the  structure  is  released. 
Figure  6(c)  illustrates  the  beams  of  a  microresonator  stuck  down  on  the  substrate.  A 
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silicon 


(a)  (b) 


(c) 

Fig.  6.  Typical  micromachining  defects:  (a)  ETC  with  a  cantilever  only  partially  re¬ 
leased,  (b)  pyramidal  protuberances  at  the  bottom  of  a  microcavity  etched  with  TMAH 
25%,  (c)  stiction  in  a  surface  micromachined  comb-drive,  and  (d)  finger  break. 

broken  finger  in  a  comb-drive  transducer  as  shown  in  Figure  6(d)  may  submit  the  shuttle 
to  additional  torques  giving  mechanical  instability  of  the  vibrating  mass  and  unwanted 
lateral  forces  leading  to  impaired  functionality  or  failure. 

By  adequately  modeling  fault  effects  for  simulation,  integration  of  design  and  test  can 
be  envisaged,  selecting  adequate  test  patterns  which  optimize  defect  and  fault  coverage 
and  facilitate  diagnosis.  Deriving  adequate  fault  models  is  obviously  linked  to  the  level  of 
description  and  procedures  used  for  their  simulation.  By  modeling  faults  at  the  circuit- 
level,  techniques  already  developed  for  the  testing  of  analog  microelectronic  circuits  can 
be  transposed  to  MEMS.  For  example,  Table  II  describes  fault  models  which  can  be  used 
for  modeling  the  fault  effects  of  the  above  defects. 

For  suspended  thermal  MEMS,  faults  can  be  modeled  conveniently  using  fault  mod¬ 
els  such  as  thermal  shorts  and  thermal  opens.  For  example,  faults  caused  by  the  micro¬ 
machining  defects  shown  in  Figure  6(a)  are  modeled  using  thermal  shorts  between  the 
suspended  cantilever  and  the  substrate.  For  the  microresonators,  faults  resulting  from 
the  break  of  a  beam,  the  break  of  a  joint  or  an  anchor  point,  or  the  stiction  of  a  beam 
can  be  readily  injected  in  a  mechanical  circuit,  such  as  that  shown  in  Figure  5(c),  using 
the  fault  models  of  Table  II. 

V.  Conclusions 

Current  research  efforts  in  MEMS  development  are  trying  to  bring  the  analysis,  design 
and  test  of  microsystems  in  line  with  microelectronics.  Higher  levels  of  design  abstrac- 


310 


Tab.  II.  Example  fault  models  for  modeling  the  effect  of  micromachining 
defects  in  suspended  thermal  MEMS  and  microresonators 


Class  of 
MEMS 

Defects 

Fault  Model 

Typical  Example 

Suspended 

thermal 

MEMS 

inadequate 

release 

thermal  short 

nodes  of  suspended 
part  stuck-at  ground 
temperature  (body) 

break 

thermal  open 

thermal  path  broken  by 
break  of  suspended  beam. 

Micro¬ 

resonators 

stiction 

mechanical  short 

nodes  stuck-at  ground, 
movement  impeded. 

break 

mechanical  open 

joint  or  anchor  break, 
floating  nodes. 

tion  are  searched,  with  a  cleaner  separation  between  technological  processing  and  design 
effort.  By  relying  on  adequate  design  tools  and  methodologies,  large  and  reliable  MEMS 
systems  can  be  envisaged.  Clearly,  this  next  generation  of  highly  integrated  systems  pose 
important  test  challenges.  Test  technology  already  developed  for  microelectronics  will 
need  to  tackle  also  the  MEMS  parts  if  the  applications  are  to  justify  their  cost.  This 
paper  has  then  tried  to  illustrate  current  steps  towards  the  tuning  of  MEMS  development 
with  that  of  microelectronics. 
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An  Object  Oriented  Methodology  for  Hardware  Design 
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Abstract  Currently,  no  design  environment  exists  that  supports  the  design 
of  a  complex  digital  chip  all  the  way  from  initial  design  exploration  down  to  and 
including  the  detailed  synthesis  phase.  In  this  paper,  an  object  oriented 
programming  approach  for  the  design  of  complex  systems  in  hardware  is 
presented  that  covers  the  whole  trajectory.  It  is  shown  how  the  usage  of  object 
oriented  techniques  resolves  some  major  obstacles  for  system-on-chip  design.  The 
design  of  a  10  Mbit/s  upstream  cable  modem  is  used  as  a  driving  example. 


I.  Introduction 

The  continuous  evolution  in  ASIC  technology  allows  the  integration  of  complete 
telecom  systems  on  silicon.  This  includes  the  complete  information-processing  path  starting 
from  physical  transport  of  bits,  over  network  layer  processing  downto  user-level  multimedia 
presentation. 

Unfortunately  not  the  same  can  be  said  from  the  tools  that  are  needed  to  design  these 
digital  systems.  In  fact,  the  tools  required  to  design  for  instance  a  mobile  phone  ASIC,  are 
even  far  from  fitting  onto  the  hard  disk  of  a  single  high-end  Personal  Computer. 

For  one  thing,  this  is  due  to  the  fact  that  each  design  discipline  (like  DSP  algorithm 
design,  network  performance  simulation,  hardware  synthesis  and  embedded  software  design) 
has  its  own  tool  and/or  favourite  environment.  As  a  result,  a  system  level  design  flow  is  a 
patchwork  of  scripts,  translators,  and  tools. 

In  addition,  much  tasks  in  the  construction  of  an  ASIC  are  still  more  of  an  art  than  a 
method.  Think  for  instance  of  bringing  a  DSP  algorithm  designed  in  Matlab  to  a  hardware 
implementation.  There  is  a  multitude  of  paths  leading  to  a  possible  solution,  and  even  more  of 
them  leading  to  a  dead  end. 

We  observe  that  there  are  some  major  obstacles  today  to  do  successful  system-on-chip 
design. 

•  There  is  a  lack  of  a  single  system-level  environment  that  can  be  used  throughout  the 
design  flow.  While  algorithms  might  be  designed  at  high  level  in  C,  gates  still  have  to  be 
synthesised  out  of  HDL.  Each  manual  format  translation,  no  matter  how  small,  is  a 
possible  source  of  errors. 

•  The  designer  has  insufficient  control  over  the  design  process.  He  or  she  has  to  accept  the 
result  that  (synthesis)  tools  produce.  This  is  a  result  of  those  tools  being  sold  as  closed 
boxes.  Assembling  a  system  level  design  flow  out  of  such  tools  however  requires  an  open 
environment. 

•  There  is  lack  of  a  systematic  verification  strategy.  There  are  as  many  testbenches  as  there 
are  tools  used  in  the  design  flow.  Especially  at  phases  in  this  flow  where  drastic  changes 
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are  done  to  the  design  representation  (e.g.  during  the  transition  from  Matlab  toVerilog), 
the  development  of  corresponding  or  equivalent  testbenches  is  extiemely  har 
possible  at  all. 

Being  stuck  with  this  situation  in  several  recent  demonstrator  designs,  we  turned 
towards  object-oriented  C++  technology  [6,7],  This  allowed  us  to  overcome  al  of  the 
obstacles  that  were  mentioned.  The  use  of  C++  has  been  demonstrated  for  tire  modeling  and 
simulation  of  parallel  hardware  systems  [1].  Our  environment  also  provides  VHDL  code 
generation  and  HDL  testbench  generation. 

In  this  paper,  an  overview  of  the  resulting  C++  design  environment  (called  OCAPI)  will 
be  given  in  addition  to  the  discussion  of  a  concrete  design  experience.  We  start  by  giving  a 
e»mpl«  of  C«.  based  design,  and  conbas.  i.  ,o  traditional  VHDL  tftj.  Ne « , 
scope  is  broadened  towards  using  C++  for  a  complete  system  level  design  flow.  This  is  finally 
illustrated  by  the  design  of  an  upstream  cable  modem. 


IT.  Hardware  design  with  C++ 

When  designing  a  system  on  chip  in  C++,  we  need  to  take  care  of  both  the  hardware  and 
software  parts  of  this  system.  While  C++  is  a  logical  choice  to  devise  the  software  parts,  using 
it  for  hardware  descriptions  is  less  obvious.  The  match  is  however  closer  than  one  might  thin 
at  first.  The  object-oriented  capabilities  of  C++  allow  us  to  write  down  a  representation  in 
terms  of  objects  that  closely  resemble  the  actual  intended  circuit.  To  illustrate  this  point, 
consider  the  design  of  a  simple  incrementer  circuit.  It  is  desired  to  create  a  synchronous, 
digital  machine,  with  one  controller  sending  instructions  to  a  datapath.  The  descriptions  in 
table  1  show  the  result  in  both  traditional  VHDL  coding  and  in  our  C++  design  environment. 


Incrementer  in  VHDL 


Incrementer  in  C++ 


Architecture  RTL  of  nty_processor  is 
Begin 

SYNC  :  process  (elk) 

Begin 


#include  "ocapi.h" 

void  main  ()  { 

sig  a (ck) ;  //  register 


If  (elk' event  and 
Clk  =  ' 1' )  then 
Current  <»  next_state; 
A_atl  <-  a; 

End  if; 
end  process; 

COMB  :  process  (a,  a_atl) 
Begin 

A  <-  a_at 1 ; 

Case  current  is 
When  statel  => 
a  ■=  0; 

next_state  <-=  state2; 
when  state2 

a  ■  a_atl  +  1; 
end  case; 

if  (reset  -  '0')  then 
a  -  0; 
end  if; 
end  process; 
end  RTL; 


sfg  reset;  //  instruction 
a  “  0; 

sfg  inc; 
a  -  a  +  1; 

fsm  f (ck) ;  //  controller 
state  statel; 
state  state2; 

f  <<  deflt (statel)  ; 
f  <<  state2; 

//  state  transitions 
statel  «  always 
«  reset 
«  state2; 
state2  «  always 
«  inc 
«  state2; 


Table  1:  Comparing  equivalent  VHDL  and  C++  descriptions 
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The  circuit  has  a  fairly  standard  description  style  in  VHDL.  We  observe  however  that, 
by  making  this  VHDL  description,  the  distinction  between  the  controller  (finite  state  machine) 
and  the  data  processing  is  lost.  In  addition,  the  constructs  that  are  used  to  write  VHDL 
(processes,  variables,  case  statements,  etc)  bear  little  resemblance  with  the  RT-level  structure 
of  the  incrementer  circuit.  The  C++  description,  shown  on  the  right,  approaches  the 
description  from  another  side.  It  uses  objects  like  sfg,  state  and  fsm  to  reflect  the  exact  design 
concept  that  was  intented.  In  this  case,  sfg  creates  a  datapath  instruction,  while  state  and  fsm 
are  parts  of  the  controller.  All  these  objects  are  related  to  each  other  through  the  use  of  C++ 
operators  and  expressions.  As  these  operators  execute,  an  object  hierarchy  is  constructed  that 
reflects  the  RT  behaviour  of  the  processor. 

As  will  be  discussed  father,  we  can  simulate  this  object  hierarchy  and  generate  VHDL 
code  out  of  it.  But  it  also  allows  us  to  design  more  effectively. 

III.  Design  flow 

In  this  section,  the  complete  system  level  design  flow  is  presented.  To  be  able  to 
construct  systems  on  chip  in  an  effective  way,  the  following  requirements  must  be  met. 

•  A  SoC  design  environment  must  be  able  to  capture  behaviour  at  high  level.  This  is  needed 
for  doing  algorithmic  design  and  exploration.  Therefore,  OCAPI  initially  captures 
behaviour  as  a  dataflow  description,  much  in  the  same  way  as  an  environment  like 
COSSAP  [9]  does.  One  difference  is  that  our  system  description  is  a  C++  program,  where 
current  environments  are  block-diagram  based. 

•  A  SoC  design  environment  must  offer  the  possibility  to  do  detailed  design  description  of 
hardware,  similar  to  traditional  HDL  environments.  OCAPI  includes  a  set  of  objects  that 
allow  describing  hardware  at  the  RT  level.  In  addition,  these  objects  can  be  co-simulated 
with  the  high-level  dataflow  description. 

•  A  SoC  design  environment  should  avoid  making  manual  translations  between  equivalent 
design  representations.  For  this  puipose,  a  C++  description  made  in  terms  of  OCAPI 
objects  can  be  translated  automatically  to  VHDL.  In  addition,  VHDL  testbenches  and  test 
vectors  are  generated  which  can  be  used  to  repeat  simulations  in  correspondence  with  the 
C++  simulation. 

•  A  SoC  design  environment  must  support  incremental  refinement  which  allows  a  smooth 
transition  from  pure  behavioural  descriptions  downto  architecture  descriptions.  In  OCAPI, 
dataflow  and  architecture  descriptions  can  be  co-simulated.  hi  addition,  also  floating  point 
and  fixed  point  datatypes  can  be  freely  mixed. 

The  design  flow  that  we  use  is  illustrated  in  Figure  1.  The  flow  contains  face  major  parts: 
a  system  level  design  part,  a  hardware  synthesis  part  and  a  hardware  verification  part. 


A.  System  design 

The  goal  of  the  system  design  phase  is  to  construct  a  functional  RT-lcvel  model  of  the 
ASIC  under  construction.  For  verification  and  test  purposes,  a  system  level  environment 
model  is  required.  We  will  use  a  cable  modem  receiver  example  as  we  go  through  the  entire 
design  flow.  For  the  design  of  this  receiver,  the  environment  model  consists  of  a  transmitter 
model  and  a  channel  model.  This  allows  system  level  simulations  and  verification  of  the 
receiver  algorithms.  These  simulations  are  collected  in  a  set  of  C++  testvectors  that  will  be 
reused  in  the  hardware  testbenches. 
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Figure  1:  The  C++  based  system  design  flow 

Initially,  a  floating-point  data  flow  model  of  the  complete  system  is  constructed 
(transmitter,  channel  model,  receiver).  Next,  the  receiver  is  refined  to  a  cycle  true  architecture 
model.  Scheduling  the  operations  of  high  level  descriptions  to  clock  cycles  does  this, 
addition,  bringing  dataflow  to  hardware  also  requires  the  mapping  of  the  dataflow  system- 
level  semantics  to  an  implementation.  This  is  a  standard  design  task  for  which  several 

solutions  exist.  ' 

After  the  architecture  has  been  obtained,  the  chip  signal  wordlengths  are  decided  in 
order  to  yield  a  cycle  true,  bit-true  architecture  model.  Fixed-point  refinement  is  done  by  means 
of  simulation.  The  required  refinement  strategy  is  dependent  on  the  type  of  application. 
However,  a  good  strategy  for  a  digital  receiver  is  the  following  one.  First,  a  reception  quality 
metric  (e.g.  constellation  purity)  is  determined  using  only  quantization  at  the  A/D  side.  Next, 
the  other  wordlengths  are  decided  such  as  to  prevent  overflow  and  to  maintain  the  reception 
quality  metric.  After  these  steps,  the  C++  model  is  a  bit-true  clock-cycle  true  representation  of 

the  architecture. 
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Figure  2:  Three  possible  simulations  during  system  level  design  of  a  cable  modem 

The  advantage  of  using  C++  during  system  design  is  clear  by  considering  Figure  2. 
Since  OCAPI  allows  to  freely  mix  algorithmic  and  architecture  descriptions,  as  well  as 
floating  and  fixed  point  datatypes,  the  receiver  description  can  be  co-simulated  with  the 
transmitter  and  channel  model  at  various  level  of  receiver  design  detail. 

Hie  system  level  design  phase  concludes  with  the  use  of  a  code  generator  that  creates, 
out  of  the  C++  description,  the  input  for  subsequent  hardware  synthesis  and  verification. 

•  For  each  block  (FSMD)  of  the  receiver,  a  synthesizable  RT- VHDL  file  is  created. 

•  For  the  overall  chip,  a  system  netlist  is  generated  to  connect  the  various  blocks  in  the 
design. 

•  The  C++  test  vectors  are  translated  into  block-level  and  system-level  testbenches.  In 
addition,  appropriate  testbench  drivers  are  generated. 

B.  Synthesis 

The  generated  VHDL  code  is  directly  fed  into  the  Synopsys  logic  synthesis  tools  [8].  It 
is  in  essence  a  fully  automated  process  that  can  be  run  in  batch. 

While  this  synthesis  flow  is  industry  standard,  the  C++  environment  and  verification 
that  surrounds  it  is  innovative. 

C.  Verification 

During  synthesis,  simulation-based  verification  is  used  extensively  to  track  the 
correctness  of  the  synthesis  results.  All  VHDL-level  simulations  are  done  using  the  generated 
testbenches  with  the  Synopsys  VSS  simulator  at  block-level  and  system  level.  The  final 
Verilog  netlist  is  checked  using  generated  production  test  vectors  with  the  Cadence  Verilog- 
XL  simulator. 

Verification  is  done  by  C++  simulation  during  the  system  design  phase  and  by  HDL 
simulation  during  the  synthesis  phase.  There  are  5  verification  levels  that  correspond  to  the  5 
description  levels  of  the  design.  Three  of  them  are  in  C++  (dataflow  floating  point,  cycle-true 
floating-point  and  cycle-true  fixed  point).  The  remainder  two  are  at  VHDL  (RT-VHDL  and 
Synopsys-DC  VHDL  outputs)  and  Verilog  (final  netlist)  level.  The  design  of  testbenches  is 
done  in  C++,  since  corresponding  HDL  testbenches  are  obtained  by  code  generation.  As 
shown  by  Figure  3,  the  test  simulations  can  be  categorised  in  three  areas:  Performance  tests, 
functional  tests,  and  equivalence  tests. 
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Performance 

Functional 

C++  Dataflow  FLP 

X 

X 

C++  Architecture  FLP 

X 

C++  Architecture  FXP 

X 

Block  HDL 

X 

System  HDL 

X 

Figure  3:  Verification  strategies  at  different  levels 

The  performance  tests  are  used  to  check  the  initial  performance  of  the  design.  For  a 
digital  receiver,  test  scenarios  include  varying  levels  of  channel  noise,  phase  distortion,  carrier 
frequency  deviation,  amplitude  slope  distortion,  gain  variation  and  burst  spacing.  These  tests 
ensure  that  the  initial  algorithmic  model  has  the  desired  performance. 

The  functional  tests  check  the  correct  operation  of  a  design  within  one  verification  level. 
Typical  tests  include  for  instance  the  reception  of  a  known  data  sequence.  The  goal  of  these 
tests  is  to  perform  a  simulation  with  maximal  coverage  of  the  design  description. 

Equivalence  tests  compare  the  operation  of  one  level  to  the  next.  They  are  applied  at 
either  floating-point  level  or  else  fixed-point  level.  Equivalence  tests  do  a  one-to-one 
comparison  of  values  on  the  system  interconnect  at  corresponding  time-points. 

IV.  An  upstream  cable  modem 

Using  the  OCAPI  object  library  and  the  design  flow  discussed  above,  we  have 
completed  several  demonstrator  designs  including  an  upstream  cable  modem,  a  DECT 
transceiver  and  a  MPEG-4  image  coder.  Some  details  on  the  cable  modem  design  are  included 
here  to  illustrate  the  power  of  C++  based  design. 

This  upstream  cable  modem  was  developed  in  a  research  project  in  co-operation  with 
Siemens- Atea,  Belgium  [5].  An  upstream  cable  modem  receiver  resides  at  the  head-end  of  the 
HFC  television  access  network.  HFC  is  a  network  architecture  that  is  built  up  of  coax  and 
fiber,  hi  upstream  HFC  communications,  modulated  data  is  transmitted  from  the  consumer 
side  to  the  head-end.  The  single  chip  digital  receiver  that  we  have  developed  is  embedded  in 
the  head-end  and  demodulates  this  data.  It  takes  care  of  the  physical  layer  signal  processing 
required  for  QAM16  or  QPSK  modulation.  Offering  10  Mbits/s  data  throughput  in  a  3.3  MHz 
upstream  channel  band,  it  is  compliant  with  the  main  HFC  communications  standards 
(MCNS/DOCSIS  [2],  DAVIC/DVB  [3]  and  IEEE  802.14  [4]).  The  chip  relies  extensively  on 
digital  signal  processing  to  demodulate  and  decode  upstream  signal  bursts.  It  also  estimates 
and  automatically  corrects  various  transmission  impairments  occurring  on  the  upstream  HFC 
channel.  Such  impairments  include  varying  signal  levels,  carrier  frequency  deviations,  group 
delay  distortion  and  amplitude  variation.  In  addition,  the  chip  provides  an  interface  for  an 
external  Reed  Solomon  channel  decoder  that  combats  channel  noise  effects. 
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A.  Development 

The  chip  was  developed  from  scratch  with  the  C++  based  design  flow.  Therefore  we 
started  by  constructing  a  system  level  functional  model  in  C++.  This  model  includes  a  burst 
transmitter,  a  channel  model,  and  a  receiver  functional  dataflow  model.  Such  a  system  model 
allows  to  explore  various  receiver  algorithms  and  to  construct  system  level  testbenches  that 
determine  the  overall  system  performance. 


Road.Solomon  D*cod«r 


Figure  4:  Upstream  cable  modem  chip  architecture 

The  resulting  receiver,  shown  in  Figure  4,  has  been  optimised  for  minimal 
communication  overhead.  For  this  purpose,  it  contains  an  advanced  flash  equalisation 
algorithm  that  works  with  a  short,  fixed  length  preamble  independent  of  the  communication 
channel  conditions.  It  also  makes  the  core  particularly  suited  for  multimedia  applications,  in 
which  wide  ranges  of  bitrates  need  to  be  supported.  All  signal  processing  is  done  entirely 
digital,  making  the  performance  reliable,  predictable  and  free  of  tuning.  This  signal  processing 
is  performed  in  a  chain  of  independent  blocks.  This  complex  allows  receiving  QAM16/QPSK 
burst  signals  with  an  interburst  spacing  of  only  4  symbols  and  a  preamble  of  17  symbols. 

Care  was  taken  to  make  the  functionality  programmable.  An  I2C  programming  interface 
allows  in-the-field  configuration  and  adjusting  of  demodulation  parameters.  The  burst  payload 
length  is  programmable.  In  addition,  various  parameters  such  as  received  signal  power  and 
equaliser  coefficients  can  be  extracted  for  signal  quality  estimation. 

The  complete  development  process  stalling  at  algorithm  design  and  ending  with  a 
clock-cycle  true,  bittrue  architecture  was  done  in  C++  with  the  OCAPI  library.  Once  the 
architecture  model  was  available,  synthesisable  RT-VHDL  code  was  generated  to  bring  the 
receiver  circuit  to  a  gate  level  implementation.  The  code  size  statistics  of  the  chip  in  the 
subsequent  design  phases  are  shown  in  table  2.  They  illustrate  the  compactness  that  can  be 
achieved  by  using  object  orientation. 


Specification 

C++  Dataflow 

922 

Lines 

C++  Architecture 

4426 

Lines 

RTVHDL 

21798 

Lines 

Gate  Level  VHDL 

154952 

Lines 

Table  2:  Cable  modem  code  size 


Using  C++  based  design  we  experienced  tight  control  over  the  entire  design  flow.  This 
is  because  there  is  only  a  single  environment  in  which  system  level  design  was  performed. 
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including  the  development  of  testbenches.  We  ensured  that  the  C++  simulations  wereco  , 
after  which  only  the  equivalence  between  C++  and  generated  HDL  ha  to  e  s  own. 
straightforward  since  the  testbenches  are  generated  from  the  C++  design  too. 

V,  Conclusions 

A  design  methodology  for  systems  on  silicon  was  presented  that  is  based  on  object 
oriented  prog  ammmg  techniques  It  was  shown  that  the  application  of  object  oriented 
program minjfto  hardware  design  allows  to  alleviate  some  of  the  major  obstacles  for  efficient 

system  level  design: 

•  There  is  a  single  system-level  design  environment  for  both  algorithm  design  and 
architecture  design.  The  transitions  between  the  two  levels  are  done  by  incremental 

.  By  using"!  programming  language,  the  designer  has  full  control  over  the  S^nd 

This  open  environment  provides  a  very  effective  way  to  deal  with  the  diversity  and 

.  Bec;^  verification  is  supported  throughout  the  design 

process.  Three  types  of  verification  were  identified. 

The  design  of  an  upstream  cable  modem  using  this  methodology  has  resulted  in  a  short 
design  time  and  first  time  right  silicon. 
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